[Bug 1899964] Re: Failover of loadbalancer fails when Amphora master is missing

Mon Nov 9 14:33:34 UTC 2020

** Changed in: cloud-archive/ussuri
       Status: Triaged => Fix Released

** Changed in: cloud-archive/train
       Status: Triaged => Fix Released

** Changed in: octavia (Ubuntu Focal)
       Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1899964

Title:
  Failover of loadbalancer fails when Amphora master is missing

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in octavia package in Ubuntu:
  Fix Released
Status in octavia source package in Focal:
  Fix Released
Status in octavia source package in Groovy:
  Fix Released

Bug description:
  [Impact]
  (from storyboard desciption) Currently if taskflow process is interrupted (during create/update/failover - node is rebooted or service is restarted) - loadbalancer will stuck in PENDING state.
  Taskflow provides persistence module which allows to save flows state for recovery https://docs.openstack.org/taskflow/latest/user/persistence.html
  Otherwise partially created/updated/deleted resources should be moved to ERROR state when service is up again. (like it is done in Cinder)

  [Test Case]

  * deploy Openstack with Octavia and 2 compute hosts e.g. ./generate-bundle.sh --use-stable-charms --release train --octavia --num-compute 3
  * juju config octavia loadbalancer-topology=ACTIVE_STANDBY
  * create ubuntu vm and install apache2 (i.e. listen port 80)
  * create loadbalancer with vm as member and floating ip for LB vip
  * test connection with: nc -vz LB_FIP 80
  * openstack loadbalancer amphora list
  * get amphora master vm uuid: openstack loadbalancer amphora show -c compute_id -f value <master>
  * openstack server show -c "OS-EXT-SRV-ATTR:host" -f value <master uuid>
  * poweroff compute host from previous step
  * openstack loadbalancer failover LB_UUID
  * wait a few seconds
  * openstack loadbalancer amphora list
  * Wait until you have one BACKUP and one MASTER
  * Test connection with: nc -vz LB_FIP 80

  [Regression Potential]
  While new failovers have been proven to work properly, this will resolve existing failed failovers which will require setting the LB state from PENDING_UPDATE to ERROR in the database prior to triggering a new failover.

  ------------------------------------------------------------------------

  Tried to failover a loadbalancer that has missing entries of amphora master.
  The loadbalancer went to ERROR state.

  OpenStack version: Train

  The fix is available in upstream as part of the Octavia Failover refactor patches in Train
  https://review.opendev.org/#/q/status:merged+project:openstack/octavia+branch:stable/train+topic:failover-refactor

  Verified with the upstream patches and it worked.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1899964/+subscriptions