[Bug 1988457] [NEW] [SRU] ovsdbapp can time out on raft leadership change

Launchpad Bug Tracker 1988457 at bugs.launchpad.net
Mon Jul 29 12:38:26 UTC 2024


You have been subscribed to a public bug by Ubuntu Foundations Team Bug Bot (crichton):

When raft leadership changes, any leader-only connections will be
disconnected and will need to reconnect to the new leader. When this
happens, the IDL will return a txn status of TRY_AGAIN. The current code
tries to do an exponential backoff with sleep() due to an issue where
those can be spammed 1000s of times a second. This sleep also prevents
reconnecting quickly because idl.run() is not called rapidly and can
lead to timeouts.

--------------------------------------------------------------------------------
SRU TEMPLATE:

[Impact]

Please see original bug description. What i can add to this is that what
we saw in production as a consequence of this was that ovsdbapp
transactions would fail after a timeout and ovsdbapp would then end up
in a retry sequence such that the transations would not get retried and
vm tap devices would not get deleted from ovs when a vm was deleted. The
result was a build up of "stale" tap devices on br-int (visible as "No
such device" entries in ovs-vsctl show).

[Test Plan]

* Deploy OpenStack Jammy (Yoga) with ml2-ovn
* Spawn several vms
* Trigger many ovn-central db leadership switches by restarting ovn-central units in rotation leaving enough between each for a new leader to be elected.
* Delete the vms and create a load more while leaders are being re-elected.
* First check that /var/log/nova/nova-compute.log does not contain the "OVSDB transaction returned TRY_AGAIN" message over and over then also check that ovs-vsctl show does not contain any "stale" ports with messages like the following:

    Port tapa5d45fc6-02
        Interface tapa5d45fc6-02
            error: "could not open network device tapa5d45fc6-02 (No such device)"


[Regression Potential]
This patch is not expected to introduce any regressions.

** Affects: cloud-archive
     Importance: Undecided
         Status: New

** Affects: cloud-archive/yoga
     Importance: Undecided
         Status: New

** Affects: ovsdbapp
     Importance: Undecided
         Status: Fix Released

** Affects: python-ovsdbapp (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: python-ovsdbapp (Ubuntu Jammy)
     Importance: Undecided
         Status: New


** Tags: in-stable-wallaby in-stable-xena in-stable-yoga in-stable-zed patch
-- 
[SRU] ovsdbapp can time out on raft leadership change
https://bugs.launchpad.net/bugs/1988457
You received this bug notification because you are a member of Ubuntu Sponsors, which is subscribed to the bug report.



More information about the Ubuntu-sponsors mailing list