[Bug 1471022] [NEW] [SRU] race between nova-compute and neutron-ovs-cleanup

Launchpad Bug Tracker 1471022 at bugs.launchpad.net
Fri Jul 3 20:17:55 UTC 2015


You have been subscribed to a public bug by Ubuntu Foundations Team Bug Bot (crichton):

[Impact]

This issue appears to be a consequence of
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1420572 where we
added a 'wait-for-state running' to the nova-compute upstart so as to
ensure that neutron-ovs-cleanup has finished before nova-compute starts.

I have started to spot, however, that on some hosts (metal only) there
is now a race between the two whereby nova-compute sometimes fails to
start on system boot/reboot with the following in /var/log/upstart/nova-
compute.log:

...
libvirt-bin stop/waiting
wait-for-state stop/waiting
neutron-ovs-cleanup start/pre-start, process 3084
start: Job failed to start

If I manually restart nova-compute all is fine. So this looks like a
race between nova-compute's wait-for-state and neutron-ovs-cleanup's
pre-start -> start/running.

The proposed solution here is add some retry logic to nova-compute
upstart job to tolerate neutron-ovs-cleanup not being able to start yet.
We, therefore, allow a certain number of retries, every other with an
incremented delay, before giving up and allowing nova-compute to start
anyway. If ovs-cleanup failed to start after what is a failry liberal
retry period, it is assumed to have failed altogether thus making is
safe(ish) to start nova-compute.

[Test Case]

In one terminal (as root) do:
service neutron-ovs-cleanup stop; service openvswitch-switch stop; service nova-compute restart

In another do:
sudo tail -F /var/log/upstart/nova-compute.log

Observe the retries occurring

Then do 'sudo service openvswitch-switch start' and observe nova-compute
retry and succeed.

[Regression Potential]

If openvswitch-switch does not start within the max retries and
intervals nova-compute will start anyway and of ovs-cleanup were at some
point to run one would see the behaviour that LP 1420572 was intended to
resolve. It does not seem to make sense to wait indefinitely for ovs-
cleanup to be up and the coded interval is pretty liberal and should be
plenty enough.

** Affects: nova (Ubuntu)
     Importance: High
     Assignee: Edward Hope-Morley (hopem)
         Status: In Progress

** Affects: nova (Ubuntu Trusty)
     Importance: High
     Assignee: Edward Hope-Morley (hopem)
         Status: In Progress

** Affects: nova (Ubuntu Utopic)
     Importance: High
     Assignee: Edward Hope-Morley (hopem)
         Status: In Progress

** Affects: nova (Ubuntu Vivid)
     Importance: High
     Assignee: Edward Hope-Morley (hopem)
         Status: In Progress


** Tags: patch
-- 
[SRU] race between nova-compute and neutron-ovs-cleanup
https://bugs.launchpad.net/bugs/1471022
You received this bug notification because you are a member of Ubuntu Sponsors Team, which is subscribed to the bug report.



More information about the Ubuntu-sponsors mailing list