[Bug 1353939] Re: Rescue fails with 'Failed to terminate process: Device or resource busy' in the n-cpu log

Liang Chen liang.chen at canonical.com
Tue Jul 21 21:07:25 UTC 2015


** Description changed:

+ [Impact]
+ 
+  * Users may sometimes fail to shutdown an instance if the associated qemu
+    process is on uninterruptable sleep (typically IO).
+ 
+ [Test Case]
+ 
+  * 1. create some IO load in a VM
+    2. look at the associated qemu, make sure it has STAT D in ps output
+    3. shutdown the instance
+    4. with the patch in place, nova will retry calling libvirt to shutdown
+       the instance 3 times to wait for the signal to be delivered to the 
+       qemu process.
+ 
+ [Regression Potential]
+ 
+  * None
+ 
+ 
  message: "Failed to terminate process" AND
  message:'InstanceNotRescuable' AND message: 'Exception during message
  handling' AND tags:"screen-n-cpu.txt"
  
  The above log stash-query reports back only the failed jobs, the 'Failed to terminate process' close other failed rescue tests,
  but tempest does not always reports them as an error at the end.
  
  message: "Failed to terminate process" AND tags:"screen-n-cpu.txt"
  
  Usual console log:
  Details: (ServerRescueTestJSON:test_rescue_unrescue_instance) Server 0573094d-53da-40a5-948a-747d181462f5 failed to reach RESCUE status and task state "None" within the required time (196 s). Current status: SHUTOFF. Current task state: None.
  
  http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres-
  full/90726cb/console.html#_2014-08-07_03_50_26_520
  
  Usual n-cpu exception:
  http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres-full/90726cb/logs/screen-n-cpu.txt.gz#_2014-08-07_03_32_02_855
  
  2014-08-07 03:32:02.855 ERROR oslo.messaging.rpc.dispatcher [req-39ce7a3d-5ceb-41f5-8f9f-face7e608bd1 ServerRescueTestJSON-2035684545 ServerRescueTestJSON-1017508309] Exception during message handling: Instance 0573094d-53da-40a5-948a-747d181462f5 cannot be rescued: Driver Error: Failed to terminate process 26425 with SIGKILL: Device or resource busy
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     incoming.message))
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     result = getattr(endpoint, method)(ctxt, **new_args)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 408, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/exception.py", line 88, in wrapped
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     payload)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/exception.py", line 71, in wrapped
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return f(self, context, *args, **kw)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 292, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     pass
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 278, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 342, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 320, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 308, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 3094, in rescue_instance
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     reason=_("Driver Error: %s") % unicode(e))
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher InstanceNotRescuable: Instance 0573094d-53da-40a5-948a-747d181462f5 cannot be rescued: Driver Error: Failed to terminate process 26425 with SIGKILL: Device or resource busy
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher

** Also affects: nova (Ubuntu)
   Importance: Undecided
       Status: New

** Tags added: patch

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1353939

Title:
  Rescue fails with 'Failed to terminate process: Device or resource
  busy' in the n-cpu log

Status in OpenStack Compute (nova):
  Fix Released
Status in nova package in Ubuntu:
  New

Bug description:
  [Impact]

   * Users may sometimes fail to shutdown an instance if the associated qemu
     process is on uninterruptable sleep (typically IO).

  [Test Case]

   * 1. create some IO load in a VM
     2. look at the associated qemu, make sure it has STAT D in ps output
     3. shutdown the instance
     4. with the patch in place, nova will retry calling libvirt to shutdown
        the instance 3 times to wait for the signal to be delivered to the 
        qemu process.

  [Regression Potential]

   * None


  message: "Failed to terminate process" AND
  message:'InstanceNotRescuable' AND message: 'Exception during message
  handling' AND tags:"screen-n-cpu.txt"

  The above log stash-query reports back only the failed jobs, the 'Failed to terminate process' close other failed rescue tests,
  but tempest does not always reports them as an error at the end.

  message: "Failed to terminate process" AND tags:"screen-n-cpu.txt"

  Usual console log:
  Details: (ServerRescueTestJSON:test_rescue_unrescue_instance) Server 0573094d-53da-40a5-948a-747d181462f5 failed to reach RESCUE status and task state "None" within the required time (196 s). Current status: SHUTOFF. Current task state: None.

  http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres-
  full/90726cb/console.html#_2014-08-07_03_50_26_520

  Usual n-cpu exception:
  http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres-full/90726cb/logs/screen-n-cpu.txt.gz#_2014-08-07_03_32_02_855

  2014-08-07 03:32:02.855 ERROR oslo.messaging.rpc.dispatcher [req-39ce7a3d-5ceb-41f5-8f9f-face7e608bd1 ServerRescueTestJSON-2035684545 ServerRescueTestJSON-1017508309] Exception during message handling: Instance 0573094d-53da-40a5-948a-747d181462f5 cannot be rescued: Driver Error: Failed to terminate process 26425 with SIGKILL: Device or resource busy
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     incoming.message))
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     result = getattr(endpoint, method)(ctxt, **new_args)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 408, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/exception.py", line 88, in wrapped
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     payload)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/exception.py", line 71, in wrapped
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return f(self, context, *args, **kw)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 292, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     pass
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 278, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 342, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 320, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 308, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/compute/manager.py", line 3094, in rescue_instance
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher     reason=_("Driver Error: %s") % unicode(e))
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher InstanceNotRescuable: Instance 0573094d-53da-40a5-948a-747d181462f5 cannot be rescued: Driver Error: Failed to terminate process 26425 with SIGKILL: Device or resource busy
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1353939/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list