[Bug 1744062] Re: L3 HA: multiple agents are active at the same time
Corey Bryant
corey.bryant at canonical.com
Tue Jul 3 13:51:38 UTC 2018
It appears the following commits are required to fix this for
keepalived:
commit e90a633c34fbe6ebbb891aa98bf29ce579b8b45c
Author: Quentin Armitage <quentin at armitage.org.uk>
Date: Fri Dec 15 21:14:24 2017 +0000
Fix removing left-over addresses if keepalived aborts
Issue #718 reported that if keepalived terminates abnormally when
it has vrrp instances in master state, it doesn't remove the
left-over VIPs and eVIPs when it restarts. This is despite
commit f4c10426c saying that it resolved this problem.
It turns out that commit f4c10426c did resolve the problem for VIPs
or eVIPs, although it did resolve the issue for iptables and ipset
configuration.
This commit now really resolves the problem, and residual VIPs and
eVIPs are removed at startup.
Signed-off-by: Quentin Armitage <quentin at armitage.org.uk>
commit f4c10426ca0a7c3392422c22079f1b71e7d4ebe9
Author: Quentin Armitage <quentin at armitage.org.uk>
Date: Sun Mar 6 09:53:27 2016 +0000
Remove ip addresses left over from previous failure
If keepalived terminates unexpectedly, for any instances for which
it was master, it leaves ip addresses configured on the interfaces.
When keepalived restarts, if it starts in backup mode, the addresses
must be removed. In addition, any iptables/ipsets entries added for
!accept_mode must also be removed, in order to avoid multiple entries
being created in iptables.
This commit removes any addresses and iptables/ipsets configuration
for any interfaces that exist when iptables starts up. If keepalived
shut down cleanly, that will only be for non-vmac interfaces, but if
it terminated unexpectedly, it can also be for any left-over vmacs.
Signed-off-by: Quentin Armitage <quentin at armitage.org.uk>
f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 is already included in:
* keepalived 1:1.3.9-1build1 (bionic/queens, cosmic/rocky)
* keepalived 1:1.3.2-1build1 (artful/pike)
* keepalived 1:1.3.2-1 (zesty/ocata) [1]
[1] zesty is EOL -
https://launchpad.net/ubuntu/+source/keepalived/1:1.3.2-1
f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 is not included in:
* keepalived 1:1.2.19-1ubuntu0.2 (xenial/mitaka)
The backport of f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 to xenial does
not look trivial. I'd prefer to backport keepalived 1:1.3.2-* to the
pike/ocata cloud archives.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1744062
Title:
L3 HA: multiple agents are active at the same time
Status in Ubuntu Cloud Archive:
Triaged
Status in Ubuntu Cloud Archive mitaka series:
Triaged
Status in Ubuntu Cloud Archive ocata series:
Triaged
Status in Ubuntu Cloud Archive pike series:
Triaged
Status in Ubuntu Cloud Archive queens series:
Triaged
Status in neutron:
New
Status in keepalived package in Ubuntu:
Triaged
Status in neutron package in Ubuntu:
Triaged
Status in keepalived source package in Xenial:
Triaged
Status in neutron source package in Xenial:
Triaged
Status in keepalived source package in Bionic:
Triaged
Status in neutron source package in Bionic:
Triaged
Bug description:
This is the same issue reported in
https://bugs.launchpad.net/neutron/+bug/1731595, however that is
marked as 'Fix Released' and the issue is still occurring and I can't
change back to 'New' so it seems best to just open a new bug.
It seems as if this bug surfaces due to load issues. While the fix
provided by Venkata (https://review.openstack.org/#/c/522641/) should
help clean things up at the time of l3 agent restart, issues seem to
come back later down the line in some circumstances. xavpaice
mentioned he saw multiple routers active at the same time when they
had 464 routers configured on 3 neutron gateway hosts using L3HA, and
each router was scheduled to all 3 hosts. However, jhebden mentions
that things seem stable at the 400 L3HA router mark, and it's worth
noting this is the same deployment that xavpaice was referring to.
It seems to me that something is being pushed to it's limit, and
possibly once that limit is hit, master router advertisements aren't
being received, causing a new master to be elected. If this is the
case it would be great to get to the bottom of what resource is
getting constrained.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list