[Bug 1940043] Re: Upgrade from OVN 20.03 to newer OVN version will cause data plane outage
Frode Nordahl
1940043 at bugs.launchpad.net
Sat Aug 6 05:36:39 UTC 2022
Steve, thank you for pointing out the lack of commentary around the need
for updating the ovn-controller systemd service as part of this SRU. I
have updated the bug description to include the reasoning behind it.
** Description changed:
[Impact]
When upgrading from OVN 20.03, as made available in Ubuntu Focal, to a newer version of OVN, it is currently not possible to upgrade the central components first. Doing so will make the ovn-controller tear down connectivity to running instances as it may not fully understand the data structure of a newer database.
To fix this situation we have backported a upstream feature [0] that
allows the ovn-controller to detect version mismatch and subsequently
refrain from making further changes to the local Open vSwitch instance
until the version mismatch is corrected.
+
+ In order to minimize the downtime on package upgrade, and to cater for
+ anyone both enabling the version mismatch feature and upgrading the
+ controller first, the ovn-controller systemd service is also updated to
+ pass the `--restart` argument when stopping the controller.
+
+ This flag tells the ovn-controller process that it should not clear out
+ Open vSwitch flows and OVN SB database records on exit, which allows
+ already installed state to continue operation until the new instance of
+ the ovn-controller process starts. [1][2][3]
[Test Plan]
1. Deploy OpenStack Ussuri from the Focal archive.
2. Launch and instance and confirm connectivity.
3. Add UCA or other PPA with a newer version of OVN and perform upgrade of the OVN components on relevant units in the deployment.
4. Confirm how new version of central components make the ovn-controller log version mismatch as well as show continued connectivity to the test instance.
5. Upgrade data plane units and confirm how the version mismatch situation is resolved and at the same time instances retain connectivity with minimal downtime during the upgrade.
[Regression Potential]
The backported feature is optional and enabled by specifically entering
a key-value pair into the local Open vSwitch database to enable it. It
has also been available upstream for several releases.
+
+ The change to the ovn-controller systemd service has been in Ubuntu
+ since Impish [3] and we have had no reports of side effects of this
+ change.
[Original Bug Description]
The upstream recommendation for upgrades of OVN is to first upgrade the data plane components (chassis aka. ovn-controller), and then upgrade the central components (the database schema and ovn-northd). The rationale for this is that the new version of the ovn-controller is required to cope with any changes to database schema or how northd programs flows.
However, during the course of rapid OVN development there has also been
introduced changes that make the new ovn-controller not cope with a old
database schema, breaking the recommended upgrade procedure.
To cope with this upstream has introduced a new optional configuration
for the ovn-controller that allows it to detect version inconsistencies,
and when they are present stop it from making changes to the data plane
until the version inconsistency is resolved [0].
For the above mentioned configuration to be effective we also need the
package to call ``ovn-ctl stop_controller`` with the --restart option so
that the ovn-controller does not flush the installed flows on exit.
We should make required changes to packages and charms to allow upgrades
to progress with less data plane outage.
- 0: https://github.com/ovn-
- org/ovn/commit/1dd27ea7aea40122c1edbff845e14abaa70c0413
+ 0: https://github.com/ovn-org/ovn/commit/1dd27ea7aea40122c1edbff845e14abaa70c0413
+ 1: https://github.com/ovn-org/ovn/commit/f508fcc14abfaaa13e9f1bf3b5b6bac59bd27a5f
+ 2: https://github.com/ovn-org/ovn/commit/45c7a85dc7f2af56191a47f1357d16b8af618e20
+ 3: https://git.launchpad.net/~ubuntu-server-dev/ubuntu/+source/ovn/commit/debian/ovn-host.ovn-controller.service?id=3c601ecc13724d3f13ec0cc989f6ffd838f787f8
** Changed in: ovn (Ubuntu Focal)
Status: Incomplete => Triaged
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ovn in Ubuntu.
https://bugs.launchpad.net/bugs/1940043
Title:
Upgrade from OVN 20.03 to newer OVN version will cause data plane
outage
Status in charm-layer-ovn:
Fix Released
Status in charm-ovn-chassis:
Fix Released
Status in charm-ovn-dedicated-chassis:
Fix Released
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive wallaby series:
Triaged
Status in ovn package in Ubuntu:
Fix Released
Status in ovn source package in Focal:
Triaged
Status in ovn source package in Hirsute:
Won't Fix
Status in ovn source package in Impish:
Fix Released
Bug description:
[Impact]
When upgrading from OVN 20.03, as made available in Ubuntu Focal, to a newer version of OVN, it is currently not possible to upgrade the central components first. Doing so will make the ovn-controller tear down connectivity to running instances as it may not fully understand the data structure of a newer database.
To fix this situation we have backported a upstream feature [0] that
allows the ovn-controller to detect version mismatch and subsequently
refrain from making further changes to the local Open vSwitch instance
until the version mismatch is corrected.
In order to minimize the downtime on package upgrade, and to cater for
anyone both enabling the version mismatch feature and upgrading the
controller first, the ovn-controller systemd service is also updated
to pass the `--restart` argument when stopping the controller.
This flag tells the ovn-controller process that it should not clear
out Open vSwitch flows and OVN SB database records on exit, which
allows already installed state to continue operation until the new
instance of the ovn-controller process starts. [1][2][3]
[Test Plan]
1. Deploy OpenStack Ussuri from the Focal archive.
2. Launch and instance and confirm connectivity.
3. Add UCA or other PPA with a newer version of OVN and perform upgrade of the OVN components on relevant units in the deployment.
4. Confirm how new version of central components make the ovn-controller log version mismatch as well as show continued connectivity to the test instance.
5. Upgrade data plane units and confirm how the version mismatch situation is resolved and at the same time instances retain connectivity with minimal downtime during the upgrade.
[Regression Potential]
The backported feature is optional and enabled by specifically
entering a key-value pair into the local Open vSwitch database to
enable it. It has also been available upstream for several releases.
The change to the ovn-controller systemd service has been in Ubuntu
since Impish [3] and we have had no reports of side effects of this
change.
[Original Bug Description]
The upstream recommendation for upgrades of OVN is to first upgrade the data plane components (chassis aka. ovn-controller), and then upgrade the central components (the database schema and ovn-northd). The rationale for this is that the new version of the ovn-controller is required to cope with any changes to database schema or how northd programs flows.
However, during the course of rapid OVN development there has also
been introduced changes that make the new ovn-controller not cope with
a old database schema, breaking the recommended upgrade procedure.
To cope with this upstream has introduced a new optional configuration
for the ovn-controller that allows it to detect version
inconsistencies, and when they are present stop it from making changes
to the data plane until the version inconsistency is resolved [0].
For the above mentioned configuration to be effective we also need the
package to call ``ovn-ctl stop_controller`` with the --restart option
so that the ovn-controller does not flush the installed flows on exit.
We should make required changes to packages and charms to allow
upgrades to progress with less data plane outage.
0: https://github.com/ovn-org/ovn/commit/1dd27ea7aea40122c1edbff845e14abaa70c0413
1: https://github.com/ovn-org/ovn/commit/f508fcc14abfaaa13e9f1bf3b5b6bac59bd27a5f
2: https://github.com/ovn-org/ovn/commit/45c7a85dc7f2af56191a47f1357d16b8af618e20
3: https://git.launchpad.net/~ubuntu-server-dev/ubuntu/+source/ovn/commit/debian/ovn-host.ovn-controller.service?id=3c601ecc13724d3f13ec0cc989f6ffd838f787f8
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-layer-ovn/+bug/1940043/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list