[Bug 1818614] Fix merged to neutron (stable/ocata)

OpenStack Infra 1818614 at bugs.launchpad.net
Mon Apr 8 17:21:41 UTC 2019


Reviewed:  https://review.openstack.org/645278
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e2d3a94018b771963dad83c36d4209cfb7f7a427
Submitter: Zuul
Branch:    stable/ocata

commit e2d3a94018b771963dad83c36d4209cfb7f7a427
Author: Slawek Kaplonski <skaplons at redhat.com>
Date:   Sun Mar 10 22:45:15 2019 +0100

    Set initial ha router state in neutron-keepalived-state-change
    
    Sometimes in case of HA routers it may happend that
    keepalived will set status of router to MASTER before
    neutron-keepalived-state-change daemon will spawn "ip monitor"
    to monitor changes of IPs in router's namespace.
    
    In such case neutron-keepalived-state-change process will never
    notice that keepalived set router to be MASTER and L3 agent will
    not be notified about that so router will not be configured properly.
    
    To avoid such race condition neutron-keepalived-state-change will
    now check if VIP address is already configured on ha interface
    before it will spawn "ip monitor". If it is already configured
    by keepalived, it will notify L3 agent that router is set to
    MASTER.
    
    Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
    Closes-Bug: #1818614
    (cherry picked from commit 8fec1ffc833eba9b3fc5f812bf881f44b4beba0c)

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1818614

Title:
  [SRU] Various L3HA functional tests fails often

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive pike series:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Fix Committed
Status in Ubuntu Cloud Archive rocky series:
  Fix Committed
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Committed
Status in neutron source package in Cosmic:
  Fix Committed
Status in neutron source package in Disco:
  Fix Released

Bug description:
  [Impact]
  Need to get this added to the Ubuntu packages in order to safeguard against missed VRRP transitions due to ip -o monitor not running at the time the transition occurs. We have seen many cases in the fields where neutron routers end up as active on multiple l3 agents (via neutron api) which leads to a number of problems.

  [Test Case]
  * deploy Openstack (any version that supports l3ha)
  * create HA router with max-l3-agents=2
  * check neutron l3-agent-list-hosting-router for master location
  * on both hosts that are running the l3-agent do

  pid=`pgrep -f "/usr/bin/neutron-keepalived-state-change --router_id=$ROUTER_UUID"`
  ps -f --ppid $pid
  pkill -f "/usr/bin/neutron-keepalived-state-change --router_id=$ROUTER_UUID"
  ps -f --ppid $pid <<<<<<<<<<< this should return nothing now
  pkill -f "/var/lib/neutron/ha_confs/$ROUTER_UUID/keepalived.conf"

  * without this patch you should now see both agents reporting the router as "active"
  * with the patch this should not happen (once neutron-keepalived-state-change has been restarted)

  [Regression Potential]
  These patches have already landed in corresponding upstream branches and therefore have undergone reviews + unit and functional testing upstream, therefore regression potential is expected to be low.

  ====================================================================

  Recently many L3 HA related functional tests are failing.
  The common thing in all those errors is fact that it fails when waiting for l3 ha router to become master.

  Example stack trace:

  ft2.12: neutron.tests.functional.agent.l3.test_ha_router.LinuxBridgeL3HATestCase.test_ha_router_lifecycle_StringException: Traceback (most recent call last):
    File "neutron/tests/base.py", line 174, in func
      return f(self, *args, **kwargs)
    File "neutron/tests/base.py", line 174, in func
      return f(self, *args, **kwargs)
    File "neutron/tests/functional/agent/l3/test_ha_router.py", line 81, in test_ha_router_lifecycle
      self._router_lifecycle(enable_ha=True, router_info=router_info)
    File "neutron/tests/functional/agent/l3/framework.py", line 274, in _router_lifecycle
      common_utils.wait_until_true(lambda: router.ha_state == 'master')
    File "neutron/common/utils.py", line 690, in wait_until_true
      raise WaitTimeout(_("Timed out after %d seconds") % timeout)
  neutron.common.utils.WaitTimeout: Timed out after 60 seconds

  Example failure: http://logs.openstack.org/79/633979/21/check/neutron-
  functional-python27/ce7ef07/logs/testr_results.html.gz

  Logstash query:
  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ha_state%20%3D%3D%20'master')%5C%22

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1818614/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list