[Bug 1983716] Re: Improve performances of glance when using rbd backend

Timo Aaltonen 1983716 at bugs.launchpad.net
Fri Oct 21 09:54:02 UTC 2022


Hello Cedric, or anyone else affected,

Accepted python-glance-store into focal-proposed. The package will build
now and be available at https://launchpad.net/ubuntu/+source/python-
glance-store/2.0.0-0ubuntu4 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
focal to verification-done-focal. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-focal. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Description changed:

+ [Impact]
+ 
+ This affect image upload performances, specifically for the instance
+ snapshot use case where the upload step to glance is taking very long
+ time with large ephemeral volume (like dozen or hundred of GB).
+ 
+ Backporting the fix will improve performance of image upload to glance
+ and thus reduce the whole snapshot duration.
+ 
+ When the image size is not known, which is true when new images are
+ uploaded to glance, and when the glance backend is Ceph, the rbd volume
+ need to be grown step by step during the upload. This fix increase the
+ size of those steps in order to reduce resize calls on the Ceph backend.
+ 
+ [Test Plan]
+ 
+ On a functionnal Openstack Ussuri cloud running on Focal:
+ 
+ 1) Initial snapshot time measurement without the fix:
+ - spawn an instance with ephemeral root volume and fill ~50GB of data: dd if=/dev/urandom of=~/random bs=1M count=50k
+ - snapshot the instance, then look for string "seconds to snapshot" in /var/log/nova/nova-compute.log on the Nova host where the instance is running:
+ '''
+ nova-compute.log.53.gz:2022-07-11 09:54:11.298 3656801 INFO nova.compute.manager [req-0c9e71e9-c17e-4069-aa68-f7928fab9166 f9ec6328f6646c4c9310ff86ff6c45fca1ead9845dfa8a8dc6c4e461e5355a75 385521b179ea48068fbe5b8ccc3c396c - 24d8399e5ee54c8484cdbf79b8ee7394 24d8399e5ee54c8484cdbf79b8ee7394] [instance: 067acb11-34e6-4626-9c33-e7afa4294dbf] Took 866.04 seconds to snapshot the instance on the hypervisor.
+ '''
+ 
+ 2) On the glance-api controller, manually patch python-glance-store 2.0.0:
+ - check glance version:
+ 
+ dpkg -l |grep glance
+ ii glance 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - Daemons
+ ii glance-api 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - API
+ ii glance-common 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - Common
+ ii python3-glance 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - Python 3 library
+ ii python3-glance-store 2.0.0-0ubuntu3 all OpenStack Image Service store library - Python 3.x
+ ii python3-glanceclient 1:3.1.1-0ubuntu1 all Client library for Openstack glance server - Python 3.x
+ 
+ - git clone https://opendev.org/openstack/glance_store.git -b stable/ussuri /usr/lib/python3/dist-packages/glance_store_trunk/
+ - cd /usr/lib/python3/dist-packages/glance_store_trunk/ && git checkout tags/2.0.0 && git cherry-pick ca0c58b
+ - systemctl stop glance-api.service
+ - mv /usr/lib/python3/dist-packages/glance_store /usr/lib/python3/dist-packages/glance_store_orig && ln -s /usr/lib/python3/dist-packages/glance_store_trunk/glance_store /usr/lib/python3/dist-packages/glance_store
+ - systemctl start glance-api.service
+ 
+ 3) Redo step 1)
+ 
+ Time taken to complete the whole snapshot whould be between 15 and ~30%
+ better. Ensure there are no bottleneck on the data path from the
+ hypervisors drive to the Ceph cluster.
+ 
+ [Other Info]
+ 
+ As Ceph cluster (and more specifically the RADOS sub layer of RBD) only
+ accounts written bytes, raise resize size to 8GB is not an issue as
+ image size is not accounted. If the cluster is close to full, the error
+ will happens during upload, not on the resize.
+ 
+ 
+ [original description]
+ 
+ 
  Hello,
  
  In order to significantly improve performances of images upload on rbd
  store, it would be great if commit [1] can be backported from branch
  2.0.1 to focal package (actually 2.0.0-0ubuntu3).
  
  Except for image upload, the real use case here is to speedup instances
  snapshots, benchmarks between 2.0.0 and 2.0.1 reports a performance gain
  of ~30%: it drops from 230 to 165 seconds with an image of 10GB (metrics
  shows up in nova-compute.log on the host where the snapshot occurs).
  
- 
- [1] commit ca0c58b52756058b6d51bf6a47aeac3d525c1e16 (HEAD -> stable/ussuri, tag: ussuri-em, tag: 2.0.1, origin/stable/ussuri)
+ [1] commit ca0c58b52756058b6d51bf6a47aeac3d525c1e16 (HEAD ->
+ stable/ussuri, tag: ussuri-em, tag: 2.0.1, origin/stable/ussuri)

** Changed in: python-glance-store (Ubuntu Focal)
       Status: Incomplete => Fix Committed

** Tags added: verification-needed verification-needed-focal

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to python-glance-store in Ubuntu.
https://bugs.launchpad.net/bugs/1983716

Title:
  Improve performances of glance when using rbd backend

Status in python-glance-store package in Ubuntu:
  Invalid
Status in python-glance-store source package in Focal:
  Fix Committed

Bug description:
  [Impact]

  This affect image upload performances, specifically for the instance
  snapshot use case where the upload step to glance is taking very long
  time with large ephemeral volume (like dozen or hundred of GB).

  Backporting the fix will improve performance of image upload to glance
  and thus reduce the whole snapshot duration.

  When the image size is not known, which is true when new images are
  uploaded to glance, and when the glance backend is Ceph, the rbd
  volume need to be grown step by step during the upload. This fix
  increase the size of those steps in order to reduce resize calls on
  the Ceph backend.

  [Test Plan]

  On a functionnal Openstack Ussuri cloud running on Focal:

  1) Initial snapshot time measurement without the fix:
  - spawn an instance with ephemeral root volume and fill ~50GB of data: dd if=/dev/urandom of=~/random bs=1M count=50k
  - snapshot the instance, then look for string "seconds to snapshot" in /var/log/nova/nova-compute.log on the Nova host where the instance is running:
  '''
  nova-compute.log.53.gz:2022-07-11 09:54:11.298 3656801 INFO nova.compute.manager [req-0c9e71e9-c17e-4069-aa68-f7928fab9166 f9ec6328f6646c4c9310ff86ff6c45fca1ead9845dfa8a8dc6c4e461e5355a75 385521b179ea48068fbe5b8ccc3c396c - 24d8399e5ee54c8484cdbf79b8ee7394 24d8399e5ee54c8484cdbf79b8ee7394] [instance: 067acb11-34e6-4626-9c33-e7afa4294dbf] Took 866.04 seconds to snapshot the instance on the hypervisor.
  '''

  2) On the glance-api controller, manually patch python-glance-store 2.0.0:
  - check glance version:

  dpkg -l |grep glance
  ii glance 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - Daemons
  ii glance-api 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - API
  ii glance-common 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - Common
  ii python3-glance 2:20.2.0-0ubuntu1 all OpenStack Image Registry and Delivery Service - Python 3 library
  ii python3-glance-store 2.0.0-0ubuntu3 all OpenStack Image Service store library - Python 3.x
  ii python3-glanceclient 1:3.1.1-0ubuntu1 all Client library for Openstack glance server - Python 3.x

  - git clone https://opendev.org/openstack/glance_store.git -b stable/ussuri /usr/lib/python3/dist-packages/glance_store_trunk/
  - cd /usr/lib/python3/dist-packages/glance_store_trunk/ && git checkout tags/2.0.0 && git cherry-pick ca0c58b
  - systemctl stop glance-api.service
  - mv /usr/lib/python3/dist-packages/glance_store /usr/lib/python3/dist-packages/glance_store_orig && ln -s /usr/lib/python3/dist-packages/glance_store_trunk/glance_store /usr/lib/python3/dist-packages/glance_store
  - systemctl start glance-api.service

  3) Redo step 1)

  Time taken to complete the whole snapshot whould be between 15 and
  ~30% better. Ensure there are no bottleneck on the data path from the
  hypervisors drive to the Ceph cluster.

  [Other Info]

  As Ceph cluster (and more specifically the RADOS sub layer of RBD)
  only accounts written bytes, raise resize size to 8GB is not an issue
  as image size is not accounted. If the cluster is close to full, the
  error will happens during upload, not on the resize.

  
  [original description]

  
  Hello,

  In order to significantly improve performances of images upload on rbd
  store, it would be great if commit [1] can be backported from branch
  2.0.1 to focal package (actually 2.0.0-0ubuntu3).

  Except for image upload, the real use case here is to speedup
  instances snapshots, benchmarks between 2.0.0 and 2.0.1 reports a
  performance gain of ~30%: it drops from 230 to 165 seconds with an
  image of 10GB (metrics shows up in nova-compute.log on the host where
  the snapshot occurs).

  [1] commit ca0c58b52756058b6d51bf6a47aeac3d525c1e16 (HEAD ->
  stable/ussuri, tag: ussuri-em, tag: 2.0.1, origin/stable/ussuri)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-glance-store/+bug/1983716/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list