[Bug 1471022] Re: [SRU] race between nova-compute and neutron-ovs-cleanup

Launchpad Bug Tracker 1471022 at bugs.launchpad.net
Mon Jul 6 10:58:24 UTC 2015


** Branch linked: lp:~ubuntu-server-dev/nova/kilo

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1471022

Title:
  [SRU] race between nova-compute and neutron-ovs-cleanup

Status in nova package in Ubuntu:
  In Progress
Status in nova source package in Trusty:
  In Progress
Status in nova source package in Utopic:
  In Progress
Status in nova source package in Vivid:
  In Progress

Bug description:
  [Impact]

  This issue appears to be a consequence of
  https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1420572 where we
  added a 'wait-for-state running' to the nova-compute upstart so as to
  ensure that neutron-ovs-cleanup has finished before nova-compute
  starts.

  I have started to spot, however, that on some hosts (metal only) there
  is now a race between the two whereby nova-compute sometimes fails to
  start on system boot/reboot with the following in /var/log/upstart
  /nova-compute.log:

  ...
  libvirt-bin stop/waiting
  wait-for-state stop/waiting
  neutron-ovs-cleanup start/pre-start, process 3084
  start: Job failed to start

  If I manually restart nova-compute all is fine. So this looks like a
  race between nova-compute's wait-for-state and neutron-ovs-cleanup's
  pre-start -> start/running.

  The proposed solution here is add some retry logic to nova-compute
  upstart job to tolerate neutron-ovs-cleanup not being able to start
  yet. We, therefore, allow a certain number of retries, every other
  with an incremented delay, before giving up and allowing nova-compute
  to start anyway. If ovs-cleanup failed to start after what is a failry
  liberal retry period, it is assumed to have failed altogether thus
  making is safe(ish) to start nova-compute.

  [Test Case]

  In one terminal (as root) do:
  service neutron-ovs-cleanup stop; service openvswitch-switch stop; service nova-compute restart

  In another do:
  sudo tail -F /var/log/upstart/nova-compute.log

  Observe the retries occurring

  Then do 'sudo service openvswitch-switch start' and observe nova-
  compute retry and succeed.

  [Regression Potential]

  If openvswitch-switch does not start within the max retries and
  intervals nova-compute will start anyway and of ovs-cleanup were at
  some point to run one would see the behaviour that LP 1420572 was
  intended to resolve. It does not seem to make sense to wait
  indefinitely for ovs-cleanup to be up and the coded interval is pretty
  liberal and should be plenty enough.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1471022/+subscriptions



More information about the Ubuntu-sponsors mailing list