[Bug 1766543] Re: instance deletion takes a while and blocks nova-compute

Corey Bryant corey.bryant at canonical.com
Thu Apr 26 13:05:04 UTC 2018


Thanks for continuing to dig into this.

Comparing your nova-compute logs and the strace unlink details, the
timestamps do seem to point to unlink causing the delay. Pasting the
strace unlink output below inline with the nova-compute logs in order of
occurrence:

nova-compute log:
2018-04-25 14:48:04.587 54255 INFO nova.virt.libvirt.driver [req-85551d96-713d-499d-b7ff-9f911fb0842d bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: bd17aeef-240b-489c-8bb6-b37167155174] Deleting instance files /srv/nova/instances/bd17aeef-240b-489c-8bb6-b37167155174_del

strace unlink:
54255 14:48:04.593792 unlink("/srv/nova/instances/bd17aeef-240b-489c-8bb6-b37167155174_del/disk" <unfinished ...>

strace unlink:
54255 14:52:06.349167 <... unlink resumed> ) = 0
54255 14:52:06.349430 unlink("/srv/nova/instances/bd17aeef-240b-489c-8bb6-b37167155174_del/disk.info") = 0

nova-compute log:
2018-04-25 14:52:06.350 54255 INFO nova.virt.libvirt.driver [req-85551d96-713d-499d-b7ff-9f911fb0842d bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: bd17aeef-240b-489c-8bb6-b37167155174] Deletion of /srv/nova/instances/bd17aeef-240b-489c-8bb6-b37167155174_del complete

This might be worth passing by kernel folks to get their opinion.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1766543

Title:
  instance deletion takes a while and blocks nova-compute

Status in nova package in Ubuntu:
  New

Bug description:
  Hi,

  I have a cloud running xenial/mitaka (with 18.02 charms).

  Sometimes, an instance will take minutes to delete. I tracked down the
  time taken to be file deletion :

  Apr 23 07:23:00 hostname nova-compute[54255]: 2018-04-23 07:23:00.920
  54255 INFO nova.virt.libvirt.driver [req-
  35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db
  1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056
  -869f-084b38580b9a] Deleting instance files
  /srv/nova/instances/97731f51-63be-4056-869f-084b38580b9a_del

  Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.767
  54255 INFO nova.virt.libvirt.driver [req-
  35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db
  1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056
  -869f-084b38580b9a] Deletion of /srv/nova/instances/97731f51-63be-4056
  -869f-084b38580b9a_del complete

  
  As you can see, 4 minutes and 33 seconds have elapsed between the 2 lines. nova-compute logs absolutely _nothing_ during this time. Periodic tasks are not run, etc... Generally, a deletion takes a few seconds top.

  The logs above are generally immediately followed by :

  Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.771
  54255 DEBUG oslo.messaging._drivers.impl_rabbit [req-
  35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db
  1cb457a8302543fea067e5f14b5241e7 - - -] Received recoverable error
  from kombu: on_error /usr/lib/python2.7/dist-
  packages/oslo_messaging/_drivers/impl_rabbit.py:683

  (which is error: [Errno 104] Connection reset by peer)

  because nova-compute doesn't even maintain the rabbitmq connection (on
  the rabbitmq server I can see errors about "Missed heartbeats from
  client, timeout: 60s").

  So nova-compute appears to be "frozen" during several minutes. This
  can cause problems because events can be missed, etc...

  We have telegraf on this host, and there's little to no CPU, disk,
  network or memory activity at that time. Nothing relevant in kern.log
  either. And this is happening on 3 different architectures, so this is
  all very puzzling.

  Is nova-compute supposed to be totally stuck while deleting instance
  files ? Have you ever seen something similar ?

  I'm going to try to repro on queens.

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1766543/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list