[Bug 1401335] Re: rbd calls block eventlet threads

Fri Mar 31 18:06:22 UTC 2017

** No longer affects: cloud-archive

** Description changed:

- [Impact]
- 
  When cinder-volume's rbd driver makes a call out to rbd it does not yield to eventlet, thus blocking all other processing.
- When this happens any pending requests are stuck unacknowledged in the rabbit queue until the current rbd task completes.  This results in an unresponsive cloud presented to the user and actions such as instance creation failing due to nova timing out waiting on cinder.
- 
- 
- As RBD commands consume a fair amount of CPU time to process we should not just background the RBD commands as that would represent a DoS risk for the cinder-volume hosts.
- One possible way to fix this would be to implement at least 2 queues that control the spawning of threads, reserving x of y threads for time sensitive and fast tasks.
- 
- 
- [Test Case]
+ When this happens any pending requests are stuck unacknowledged in the rabbit queue until the current rbd task completes. This results in an unresponsive cloud presented to the user and actions such as instance creation failing due to nova timing out waiting on cinder.

  Requirements to reproduce:
  1: Ceph set up with a rbd backend
  2: A single ceph-volume worker to prevent the distributed nature from masking the problem
  3: A method of creating a large volume, writing to it

  Steps to verify volume will trigger issue on delete:
  1: Get the UUID of the volume you have created and dirtied
  2: Use the rbd command on your ceph cluster to delete the volume and verify it takes a couple minutes to delete.
  3: Delete the volume in cinder to cleanup cinder's database.

  Steps to reproduce:
  1: Create a volume that will take more than an instant to delete.
  2: Delete the volume
  3: Immediately attempt to create some volumes

  Expected results:
  Volumes create in a timely manner and become available
  Volume delete processes and delete finishes in parallel

  Actual results:
  Volumes creations are processed after the delete has finished
  Volume delete blocks threads and must process first

- [Regression Potential]
- 
- This patch moves all rados calls to a separate python thread which
- doesn't block eventlet loop.
+ As RBD commands consume a fair amount of CPU time to process we should not just background the RBD commands as that would represent a DoS risk for the cinder-volume hosts.
+ One possible way to fix this would be to implement at least 2 queues that control the spawning of threads, reserving x of y threads for time sensitive and fast tasks.

** Patch removed: "trusty-kilo.debdiff"
   https://bugs.launchpad.net/cinder/+bug/1401335/+attachment/4851537/+files/trusty-kilo.debdiff

** Tags removed: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1401335

Title:
  rbd calls block eventlet threads

Status in Cinder:
  Fix Released

Bug description:
  When cinder-volume's rbd driver makes a call out to rbd it does not yield to eventlet, thus blocking all other processing.
  When this happens any pending requests are stuck unacknowledged in the rabbit queue until the current rbd task completes. This results in an unresponsive cloud presented to the user and actions such as instance creation failing due to nova timing out waiting on cinder.

  Requirements to reproduce:
  1: Ceph set up with a rbd backend
  2: A single ceph-volume worker to prevent the distributed nature from masking the problem
  3: A method of creating a large volume, writing to it

  Steps to verify volume will trigger issue on delete:
  1: Get the UUID of the volume you have created and dirtied
  2: Use the rbd command on your ceph cluster to delete the volume and verify it takes a couple minutes to delete.
  3: Delete the volume in cinder to cleanup cinder's database.

  Steps to reproduce:
  1: Create a volume that will take more than an instant to delete.
  2: Delete the volume
  3: Immediately attempt to create some volumes

  Expected results:
  Volumes create in a timely manner and become available
  Volume delete processes and delete finishes in parallel

  Actual results:
  Volumes creations are processed after the delete has finished
  Volume delete blocks threads and must process first

  As RBD commands consume a fair amount of CPU time to process we should not just background the RBD commands as that would represent a DoS risk for the cinder-volume hosts.
  One possible way to fix this would be to implement at least 2 queues that control the spawning of threads, reserving x of y threads for time sensitive and fast tasks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1401335/+subscriptions