[Bug 1990978] Re: Raft bug: OVSDB leadership transfers every 10-20 min after initial compaction

Fri Sep 30 09:10:22 UTC 2022

** Description changed:

+ [Impact]
+ 
+ [Test Case]
+ 
+ [Where things could go wrong]
+ 
+ [Original bug description]
+ 
  First compaction starts after 24 hours, or earlier after doubling of DB
  size.

  Subsequent compactions will trigger every 10-20 min.

  The OVS version hitting this issue:
  ovs-vsctl (Open vSwitch) 2.17.2

  Commit ID that fixes the issue is:
  https://github.com/openvswitch/ovs/commit/a32a4e1fa2d3fad284834d4b7bccc2e71d33f9da

  https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d7781
  (2.17 branch patch)

  Reproducer:
  Trigger compactions by using command line tool:
  ovs-appctl -t /var/run/ovn/ovnsb_db.ctl ovsdb-server/compact
  or by creating DB pressure, i.e.:
  #!/bin/bash
  for i in {1..5000}
  do
  ovn-nbctl ls-add sw$i
  if [[ $? -ne 0 ]] ; then
-     echo "Failed on ls-add i: $i"
-     exit 1
+     echo "Failed on ls-add i: $i"
+     exit 1
  fi
-         for j in {1..2000}
-         do
-                 echo "Iteration i: $i and j:$j"
-                 ovn-nbctl lsp-add sw$i sw$i$j
-                 if [[ $? -ne 0 ]] ; then
-                     echo "Failed on lsp-add i: $i and j: $j"
-                     exit 1
-                 fi
-         done
+         for j in {1..2000}
+         do
+                 echo "Iteration i: $i and j:$j"
+                 ovn-nbctl lsp-add sw$i sw$i$j
+                 if [[ $? -ne 0 ]] ; then
+                     echo "Failed on lsp-add i: $i and j: $j"
+                     exit 1
+                 fi
+         done
  done
  for i in {1..5000}
  do
-         echo "Delete iteration i: $i"
-         ovn-nbctl ls-del sw$i
-         if [[ $? -ne 0 ]] ; then
-             echo "Failed on ls-del i: $i"
-             exit 1
-         fi
+         echo "Delete iteration i: $i"
+         ovn-nbctl ls-del sw$i
+         if [[ $? -ne 0 ]] ; then
+             echo "Failed on ls-del i: $i"
+             exit 1
+         fi
  done

  Check for leadership transfers using:
  sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log
  There should be a new entry every 10-20min.

  === Ubuntu SRU Details ===

  [Impact]
- Please see 
+ Please see

  [Test Case]
  * deploy Openstack Yoga
  * connect to the NB DB leader and run the script to generate DB pressure. Compaction will occur after the DB doubles its size
  * check for subsequent transfers after one hour using the following script:
  sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log

  [Where things could go wrong]
  Regression is not expected since it reduces the frequency of transfers.
  The fix has also been applied upstream https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d778 however a new version has not been released.

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1990978

Title:
  Raft bug: OVSDB leadership transfers every 10-20 min after initial
  compaction

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in openvswitch package in Ubuntu:
  New
Status in openvswitch source package in Jammy:
  New
Status in openvswitch source package in Kinetic:
  New

Bug description:
  [Impact]

  [Test Case]

  [Where things could go wrong]

  [Original bug description]

  First compaction starts after 24 hours, or earlier after doubling of
  DB size.

  Subsequent compactions will trigger every 10-20 min.

  The OVS version hitting this issue:
  ovs-vsctl (Open vSwitch) 2.17.2

  Commit ID that fixes the issue is:
  https://github.com/openvswitch/ovs/commit/a32a4e1fa2d3fad284834d4b7bccc2e71d33f9da

  https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d7781
  (2.17 branch patch)

  Reproducer:
  Trigger compactions by using command line tool:
  ovs-appctl -t /var/run/ovn/ovnsb_db.ctl ovsdb-server/compact
  or by creating DB pressure, i.e.:
  #!/bin/bash
  for i in {1..5000}
  do
  ovn-nbctl ls-add sw$i
  if [[ $? -ne 0 ]] ; then
      echo "Failed on ls-add i: $i"
      exit 1
  fi
          for j in {1..2000}
          do
                  echo "Iteration i: $i and j:$j"
                  ovn-nbctl lsp-add sw$i sw$i$j
                  if [[ $? -ne 0 ]] ; then
                      echo "Failed on lsp-add i: $i and j: $j"
                      exit 1
                  fi
          done
  done
  for i in {1..5000}
  do
          echo "Delete iteration i: $i"
          ovn-nbctl ls-del sw$i
          if [[ $? -ne 0 ]] ; then
              echo "Failed on ls-del i: $i"
              exit 1
          fi
  done

  Check for leadership transfers using:
  sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log
  There should be a new entry every 10-20min.

  === Ubuntu SRU Details ===

  [Impact]
  Please see

  [Test Case]
  * deploy Openstack Yoga
  * connect to the NB DB leader and run the script to generate DB pressure. Compaction will occur after the DB doubles its size
  * check for subsequent transfers after one hour using the following script:
  sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log

  [Where things could go wrong]
  Regression is not expected since it reduces the frequency of transfers.
  The fix has also been applied upstream https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d778 however a new version has not been released.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1990978/+subscriptions