[Bug 1837635] Re: HA router state change from "standby" to "master" should be delayed

OpenStack Infra 1837635 at bugs.launchpad.net
Wed Apr 15 18:15:02 UTC 2020


Reviewed:  https://review.opendev.org/719968
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7682d2fa77108b148ede651525458babc1b30d8d
Submitter: Zuul
Branch:    stable/rocky

commit 7682d2fa77108b148ede651525458babc1b30d8d
Author: Rodolfo Alonso Hernandez <ralonsoh at redhat.com>
Date:   Wed Jul 24 16:32:02 2019 +0000

    Delay HA router transition from "backup" to "master"
    
    As described in the bug, when a HA router transitions from "master" to
    "backup", "keepalived" processes will set the virtual IP in all other
    HA routers. Each HA router will then advert it and "keepalived" will
    decide, according to a trivial algorithm (higher interface IP), which
    one should be "master". At this point, the other "keepalived" processes
    running in the other servers, will remove the HA router virtual IP
    assigned an instant before
    
    To avoid transitioning some routers form "backup" to "master" and then
    to "backup" in a very short period, this patch delays the "backup" to
    "master" transition, waiting for a possible new "backup" state. If
    during the waiting period (set to the HA VRRP advert time, 2 seconds
    default) to set the HA state to "master", the L3 agent receives a new
    "backup" HA state, the L3 agent does nothing.
    
    Conflicts:
        neutron/agent/l3/agent.py
    
    Closes-Bug: #1837635
    
    Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
    (cherry picked from commit 3f022a193f66fde3bfd945af1119a60dfe91cb91)
    (cherry picked from commit adac5d9b7a72b4edeba5357c6a47e7e528fcf775)


** Changed in: cloud-archive/rocky
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1837635

Title:
  HA router state change from "standby" to "master" should be delayed

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive queens series:
  In Progress
Status in Ubuntu Cloud Archive rocky series:
  Fix Committed
Status in Ubuntu Cloud Archive stein series:
  New
Status in neutron:
  Fix Released

Bug description:
  Currently, when a HA state change occurs, the agent execute a series
  of actions [1]: updates the metadata proxy, updates the prefix
  delegation, executed L3 extension "ha_state_change" methods, updates
  the radvd status and notifies this to the server.

  When, in a system with more than two routers (one in "active" mode and
  the others in "standby"), a switch-over is done, the "keepalived"
  process [2] in each "standby" server will set the virtual IP in the HA
  interface and advert it. In case that other router HA interface has
  the same priority (by default in Neutron, the HA instances of the same
  router ID will have the same priority, 50) but higher IP [3], the HA
  interface of this instance will have the VIPs and routes deleted and
  will become "standby" again. E.g.: [4]

  In some cases, we have detected that when the master controller is
  rebooted, the change from "standby" to "master" of the other two
  servers is detected, but the change from "master" to "standby" of the
  server with lower IP (as commented before) is not registered by the
  server, because the Neutron server is still not accessible (the master
  controller was rebooted). This status change, sometimes, is lost. This
  is the situation when both "standby" servers become "master" but the
  "master"-"standby" transition of one of them is lost.

  1) INITIAL STATUS
  (overcloud) [stack at undercloud-0 ~]$ neutron l3-agent-list-hosting-router router
  neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
  +--------------------------------------+--------------------------+----------------+-------+----------+
  | id                                   | host                     | admin_state_up | alive | ha_state |
  +--------------------------------------+--------------------------+----------------+-------+----------+
  | 4056cd8e-e062-4f45-bc83-d3eb51905ff5 | controller-0.localdomain | True           | :-)   | standby  |
  | 527d6a6c-8d2e-4796-bbd0-8b41cf365743 | controller-2.localdomain | True           | :-)   | standby  |
  | edbdfc1c-3505-4891-8d00-f3a6308bb1de | controller-1.localdomain | True           | :-)   | active   |
  +--------------------------------------+--------------------------+----------------+-------+----------+

  2) CONTROLLER 1 REBOOTED
  neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
  +--------------------------------------+--------------------------+----------------+-------+----------+
  | id                                   | host                     | admin_state_up | alive | ha_state |
  +--------------------------------------+--------------------------+----------------+-------+----------+
  | 4056cd8e-e062-4f45-bc83-d3eb51905ff5 | controller-0.localdomain | True           | :-)   | active   |
  | 527d6a6c-8d2e-4796-bbd0-8b41cf365743 | controller-2.localdomain | True           | :-)   | active   |
  | edbdfc1c-3505-4891-8d00-f3a6308bb1de | controller-1.localdomain | True           | :-)   | standby  |
  +--------------------------------------+--------------------------+----------------+-------+----------+

  
  The aim of this bug is to make public this problem and propose a patch to delay the transition from "standby" to "master" to let keepalived, among all the instances running in the HA servers, to decide which one of them is the "master" server.

  
  [1] https://github.com/openstack/neutron/blob/stable/stein/neutron/agent/l3/ha.py#L115-L134
  [2] https://www.keepalived.org/
  [3] This method is used by keepalived to define which router is predominant and must be master.
  [4] http://paste.openstack.org/show/754760/

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1837635/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list