[Bug 1888675] Re: [sru] fail to extend in-use fibre channel volume due to multipath-tools version
OpenStack Infra
1888675 at bugs.launchpad.net
Tue Oct 3 18:05:44 UTC 2023
Reviewed: https://review.opendev.org/c/openstack/os-brick/+/888343
Committed: https://opendev.org/openstack/os-brick/commit/994cfb3f38b7481c8bf06a615bd9959f93bbd142
Submitter: "Zuul (22348)"
Branch: stable/yoga
commit 994cfb3f38b7481c8bf06a615bd9959f93bbd142
Author: zhaoleilc <15247232416 at 163.com>
Date: Mon Nov 16 14:08:59 2020 +0800
Avoid volume extension errors caused by multipath-tools version
`multipathd reconfigure` is an asynchronous command as of
multipath-tools 0.6.1 [1][2], potentially even before that [3].
Extending in-use iSCSI or FC volumes can fail because of that
as `multipathd resize map` will output "timeout" while the
"multipathd reconfigure" operation is still in progress.
This commit will ensure that multipathd errors are handled
accordingly, retrying in case of timeouts for up to 2 minutes.
[1] https://github.com/openSUSE/multipath-tools/blob/0.6.0/multipathd/main.c#L997
[2] https://github.com/openSUSE/multipath-tools/blob/0.6.1/multipathd/main.c#L1135
[3] https://github.com/opensvc/multipath-tools/blob/b21c567961f518810a1ac3b209c8db45f6dbac2c/multipathd/cli_handlers.c#L847-L851
Change-Id: I66e866700728eee7160f48455258c3974ada55bf
Closes-Bug: #1888675
(cherry picked from commit 557f38677a07386807b8d284a20b0ecaa61490f9)
** Tags added: in-stable-yoga
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1888675
Title:
[sru] fail to extend in-use fibre channel volume due to multipath-
tools version
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive yoga series:
Fix Released
Status in Ubuntu Cloud Archive zed series:
Fix Released
Status in os-brick:
Fix Released
Status in python-os-brick package in Ubuntu:
Fix Released
Status in python-os-brick source package in Jammy:
Fix Released
Bug description:
[IMPACT]
The `multipathd reconfigure` has became a asynchronous command since the 0.6.1 version of multipath-tools. There is a difference as follows:
https://github.com/openSUSE/multipath-tools/blob/0.6.0/multipathd/main.c#L997
https://github.com/openSUSE/multipath-tools/blob/0.6.1/multipathd/main.c#L1135
That leads to a failure to extend in-use fibre channel volume, because
`multipathd resize map` will output 'timeout' before `multipathd
reconfigure` command finishes when `multipathd resize map` command
will be executed as soon as `multipathd reconfigure` command is
executed.
However, current code only considers the 'fail' result and so timeouts
are not retried, but instead end up as failed, resulting in the FC
volume not extending.
[TEST PLAN]
1. Guarantee that there are enough fibre channel volumes attached on
the compute node so that `multipathd reconfigure` requires a huge
amount of time.
2. Create a server on the compute node and the server name we call
'c1'.
3. Attach a volume whose name is 'v1' to the server 'c1' and the size
of 'v1' is 4G.
$ openstack server add volume c1 v1
4. Extend the volume which is called 'v1' to 8G.
$ cinder --os-volume-api-version 3.42 extend v1 8
Check the size using the command of 'fdisk -l') and verify from the
logs (see [OTHER INFO])
Without the fix, after the volume have been extended from 4G to 8G,
the volume in the instance is still 4G.The fibre channel volume
scsi_wwn has been changed to 8G.
With the fix, the new size will reflect immediately because if
multipathd resize map returns a timeout, we keep re-trying the same
multipathd resize map command for 120 seconds more, giving a chance
for the (now asynchronous) 'multipathd reconfigure' to complete and
hence letting multipath resize map run succcessfully when we retry.
[WHERE PROBLEMS COULD OCCUR]
I have verified the code is robust and I do not anticipate any issues.
The patch is already merged to master, and at the time of writing
this, has received 2 acks for the merge into
yoga.(https://review.opendev.org/c/openstack/os-brick/+/888343).
"multipathd resize map" will not return anything but 1 or 0, (see
https://github.com/openSUSE/multipath-
tools/blob/0.6.1/multipathd/cli_handlers.c#L702C1-L719C2 ) and if it
returns 1, the ProcessExecutionError exception will indeed be raised,
because this exception is raised for any return value from the
executed command apart from a default of [0].
(https://docs.openstack.org/oslo.concurrency/latest/reference/processutils.html)
However if the timeout is for genuine reasons, and multipath timeout
is set to a smaller value, say 30 seconds, we would be needlessly
waiting 120 seconds instead of failing the operation at 30 seconds.
Also, we could run into this same issue if the resize map operation
takes even longer than 120 seconds but that is unlikely and I
anticipate the multipathd timeout will also be set to a max of 120
seconds.
[OTHER INFO]
Logs WITHOUT the fix show
==============
2020-07-23 12:42:46.764 2713929 INFO nova.compute.manager [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0e0ec97 4f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] [instance: ddd3010f-fdf9-4e50-a363-edd02532e683] Cinder d-c206-4713-8381-1ee47d412f31; extending it to detect new size
2020-07-23 12:42:46.764 2713929 INFO nova.compute.manager [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0e0ec97 4f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] [instance: ddd3010f-fdf9-4e50-a363-edd02532e683] Cinder d-c206-4713-8381-1ee47d412f31; extending it to detect new size
2020-07-23 12:42:48.254 2713929 INFO os_brick.initiator.linuxscsi [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0825c54e0f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] Find Multipath device file for volume WWN 3600502196
2020-07-23 12:42:48.355 2713929 INFO os_brick.initiator.linuxscsi [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0825c54e0f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] mpath(/dev/disk/by-id/dm-uuid-mpath-360050767088current size 4294967296
2020-07-23 12:42:48.449 2713929 INFO os_brick.initiator.linuxscsi [req-8defc1e3-c514-4673-a3b7-98b5343ba1cd 46ff538c684b4816b9454bfdc0825c54e0f20deff2 - 15396630649143a78afa714b3e4a0adb 15396630649143a78afa714b3e4a0adb] mpath(/dev/disk/by-id/dm-uuid-mpath-360050767088new size 4294967296
The logs indicate that the current (i.e older) size (4294967296) is
the same as the new size. (4294967296)
Note that the fibre channel volume scsi_wwn has been changed to the
new size.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1888675/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list