[Bug 1888675] Autopkgtest regression report (python-os-brick/5.2.2-0ubuntu1.3)

Fri Sep 15 16:14:16 UTC 2023

All autopkgtests for the newly accepted python-os-brick (5.2.2-0ubuntu1.3) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

cinder/2:20.3.0-0ubuntu1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-
migration/jammy/update_excuses.html#python-os-brick

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1888675

Title:
  [sru] fail to extend in-use fibre channel volume due to multipath-
  tools version

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  Triaged
Status in Ubuntu Cloud Archive zed series:
  Fix Released
Status in os-brick:
  Fix Released
Status in python-os-brick package in Ubuntu:
  Fix Released
Status in python-os-brick source package in Jammy:
  Fix Committed

Bug description:
  [IMPACT]

  The `multipathd reconfigure` has became a asynchronous command since the 0.6.1 version of multipath-tools. There is a difference as follows:
  https://github.com/openSUSE/multipath-tools/blob/0.6.0/multipathd/main.c#L997
  https://github.com/openSUSE/multipath-tools/blob/0.6.1/multipathd/main.c#L1135

  That leads to a failure to extend in-use fibre channel volume, because
  `multipathd resize map` will output 'timeout' before `multipathd
  reconfigure` command finishes when `multipathd resize map` command
  will be executed as soon as `multipathd reconfigure` command is
  executed.

  However, current code only considers the 'fail' result and so timeouts
  are not retried, but instead end up as failed, resulting in the FC
  volume not extending.

  [TEST PLAN]

  1. Guarantee that there are enough fibre channel volumes attached on
  the compute node so that `multipathd reconfigure` requires a huge
  amount of time.

  2. Create a server on the compute node and the server name we call
  'c1'.

  3. Attach a volume whose name is 'v1' to the server 'c1' and the size
  of 'v1' is 4G.

  $ openstack server add volume c1 v1

  4. Extend the volume which is called 'v1' to 8G.

  $ cinder --os-volume-api-version 3.42 extend v1 8

  Check the size using the command of 'fdisk -l') and verify from the
  logs (see [OTHER INFO])

  Without the fix, after the volume have been extended from 4G to 8G,
  the volume in the instance is still 4G.The fibre channel volume
  scsi_wwn has been changed to 8G.

  With the fix, the new size will reflect immediately because if
  multipathd resize map returns a timeout, we keep re-trying the same
  multipathd resize map command for 120 seconds more, giving a chance
  for the (now asynchronous) 'multipathd reconfigure' to complete and
  hence letting multipath resize map run succcessfully when we retry.

  [WHERE PROBLEMS COULD OCCUR]

  I have verified the code is robust and I do not anticipate any issues.
  The patch is already merged to master, and at the time of writing
  this, has received 2 acks for the merge into
  yoga.(https://review.opendev.org/c/openstack/os-brick/+/888343).
  "multipathd resize map" will not return anything but 1 or 0, (see
  https://github.com/openSUSE/multipath-
  tools/blob/0.6.1/multipathd/cli_handlers.c#L702C1-L719C2 ) and if it
  returns 1, the ProcessExecutionError exception will indeed be raised,
  because this exception is raised for any return value from the
  executed command apart from a default of [0].
  (https://docs.openstack.org/oslo.concurrency/latest/reference/processutils.html)

  However if the timeout is for genuine reasons, and multipath timeout
  is set to a smaller value, say 30 seconds, we would be needlessly
  waiting 120 seconds instead of failing the operation at 30 seconds.
  Also, we could run into this same issue if the resize map operation
  takes even longer than 120 seconds but that is unlikely and I
  anticipate the multipathd timeout will also be set to a max of 120
  seconds.

  [OTHER INFO]

  Logs WITHOUT the fix show
  ==============
  2020-07-23 12:42:46.764 2713929 INFO nova.compute.manager [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0e0ec97 4f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] [instance: ddd3010f-fdf9-4e50-a363-edd02532e683] Cinder d-c206-4713-8381-1ee47d412f31; extending it to detect new size
  2020-07-23 12:42:46.764 2713929 INFO nova.compute.manager [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0e0ec97 4f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] [instance: ddd3010f-fdf9-4e50-a363-edd02532e683] Cinder d-c206-4713-8381-1ee47d412f31; extending it to detect new size
  2020-07-23 12:42:48.254 2713929 INFO os_brick.initiator.linuxscsi [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0825c54e0f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] Find Multipath device file for volume WWN 3600502196
  2020-07-23 12:42:48.355 2713929 INFO os_brick.initiator.linuxscsi [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0825c54e0f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] mpath(/dev/disk/by-id/dm-uuid-mpath-360050767088current size 4294967296
  2020-07-23 12:42:48.449 2713929 INFO os_brick.initiator.linuxscsi [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0825c54e0f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] mpath(/dev/disk/by-id/dm-uuid-mpath-360050767088new size 4294967296

  The logs indicate that the current (i.e older) size (4294967296) is
  the same as the new size. (4294967296)

  Note that the fibre channel volume scsi_wwn has been changed to the
  new size.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1888675/+subscriptions