[Bug 1834875] Re: cloud-init growpart race with udev
Ryan Harper
1834875 at bugs.launchpad.net
Wed Nov 6 16:39:21 UTC 2019
A couple of comments on the suggested path:
> Imho the sequency of commands should be:
> * take flock on the device, to neutralise udev
+1 on this approach. Do you know if the flock will block
systemd's inotify write watch on the block device which triggers
udevd? This is the typical race we see with partition creation
and rules executing.
> * modify device with sfdisk
> * reread partitions tables (i would say with blockdev --rereadpt, rather than partx/partprobe)
I'm not sure we can use blockdev --rereadpt as we are operating upon the
root disk we're booted on and my understanding is that the ioctl that
partx uses is the only way to update the kernel partition table while
the disk is in use, otherwise we'd see the normal warning message like
when you fdisk your booted device and it says the disk is busy and
cannot read the partition table.
> * release the flock
+1
> * udevadm trigger --action=add --wait device (or trigger && settle)
I don't relish the idea of *re-adding* actions on the disk again since the partx update
should have already emitted the uevents associated with the new partitions. However,
we could do this as a way to force reloading of everything. I'd like to withhold
judgement on whether we need this after testing with use of flock on the device.
> And like have a canary "only use locked codepath on this region" such
that we can assert through testing that this no longer happens with new
code, but does with old code.
The change to cloud-utils growpart could add a flag (--use-flock) so
cloud-init could emit different log messages on which path it takes
(including a warning if we cannot use flock (ie, you may race).
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875
Title:
cloud-init growpart race with udev
Status in cloud-init:
Incomplete
Status in cloud-utils:
New
Status in linux-azure package in Ubuntu:
New
Status in systemd package in Ubuntu:
Incomplete
Bug description:
On Azure, it happens regularly (20-30%), that cloud-init's growpart
module fails to extend the partition to full size.
Such as in this example:
========================================
2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules
freq=freq)
File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
return self._runners.run(name, functor, args, freq, clear_on_fail)
File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
results = functor(*args)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle
func=resize_devices, args=(resizer, devices))
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time
ret = func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices
(old, new) = resizer.resize(disk, ptnum, blockdev)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
return (before, get_size(partdev))
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size
fd = os.open(filename, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'
========================================
@rcj suggested this is a race with udev. This seems to only happen on
Cosmic and later.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
More information about the foundations-bugs
mailing list