[Bug 1471022] Re: [SRU] race between nova-compute and neutron-ovs-cleanup
Launchpad Bug Tracker
1471022 at bugs.launchpad.net
Wed Jul 15 13:52:35 UTC 2015
** Branch linked: lp:~ubuntu-server-dev/nova/icehouse
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1471022
Title:
[SRU] race between nova-compute and neutron-ovs-cleanup
Status in nova package in Ubuntu:
Fix Released
Status in nova source package in Trusty:
Fix Committed
Status in nova source package in Utopic:
Fix Committed
Status in nova source package in Vivid:
Fix Committed
Bug description:
[Impact]
This issue appears to be a consequence of
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1420572 where we
added a 'wait-for-state running' to the nova-compute upstart so as to
ensure that neutron-ovs-cleanup has finished before nova-compute
starts.
I have started to spot, however, that on some hosts (metal only) there
is now a race between the two whereby nova-compute sometimes fails to
start on system boot/reboot with the following in /var/log/upstart
/nova-compute.log:
...
libvirt-bin stop/waiting
wait-for-state stop/waiting
neutron-ovs-cleanup start/pre-start, process 3084
start: Job failed to start
If I manually restart nova-compute all is fine. So this looks like a
race between nova-compute's wait-for-state and neutron-ovs-cleanup's
pre-start -> start/running.
The proposed solution here is add some retry logic to nova-compute
upstart job to tolerate neutron-ovs-cleanup not being able to start
yet. We, therefore, allow a certain number of retries, every other
with an incremented delay, before giving up and allowing nova-compute
to start anyway. If ovs-cleanup failed to start after what is a failry
liberal retry period, it is assumed to have failed altogether thus
making is safe(ish) to start nova-compute.
[Test Case]
In one terminal (as root) do:
service neutron-ovs-cleanup stop; service openvswitch-switch stop; service nova-compute restart
In another do:
sudo tail -F /var/log/upstart/nova-compute.log
Observe the retries occurring
Then do 'sudo service openvswitch-switch start' and observe nova-
compute retry and succeed.
[Regression Potential]
If openvswitch-switch does not start within the max retries and
intervals nova-compute will start anyway and of ovs-cleanup were at
some point to run one would see the behaviour that LP 1420572 was
intended to resolve. It does not seem to make sense to wait
indefinitely for ovs-cleanup to be up and the coded interval is pretty
liberal and should be plenty enough.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1471022/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list