[Bug 1834875] Re: cloud-init growpart race with udev

Ryan Harper 1834875 at bugs.launchpad.net
Wed Nov 6 16:39:21 UTC 2019


A couple of comments on the suggested path:

> Imho the sequency of commands should be:
> * take flock on the device, to neutralise udev

+1  on this approach. Do you know if the flock will block
systemd's inotify write watch on the block device which triggers
udevd?  This is the typical race we see with partition creation
and rules executing.

> * modify device with sfdisk
> * reread partitions tables (i would say with blockdev --rereadpt, rather than partx/partprobe)

I'm not sure we can use blockdev --rereadpt as we are operating upon the
root disk we're booted on and my understanding is that the ioctl that
partx uses is the only way to update the kernel partition table while
the disk is in use, otherwise we'd see the normal warning message like
when you fdisk your booted device and it says the disk is busy and
cannot read the partition table.

> * release the flock

+1

> * udevadm trigger --action=add --wait device (or trigger && settle)

I don't relish the idea of *re-adding* actions on the disk again since the partx update
should have already emitted the uevents associated with the new partitions.  However,
we could do this as a way to force reloading of everything.  I'd like to withhold
judgement on whether we need this after testing with use of flock on the device.

> And like have a canary "only use locked codepath on this region" such
that we can assert through testing that this no longer happens with new
code, but does with old code.

The change to cloud-utils growpart could add a flag (--use-flock) so
cloud-init could emit different log messages on which path it takes
(including a warning if we cannot use flock (ie, you may race).

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

Status in cloud-init:
  Incomplete
Status in cloud-utils:
  New
Status in linux-azure package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  On Azure, it happens regularly (20-30%), that cloud-init's growpart
  module fails to extend the partition to full size.

  Such as in this example:

  ========================================

  2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
  2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
  2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
  2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
  2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules
      freq=freq)
    File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
      return self._runners.run(name, functor, args, freq, clear_on_fail)
    File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
      results = functor(*args)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle
      func=resize_devices, args=(resizer, devices))
    File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time
      ret = func(*args, **kwargs)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices
      (old, new) = resizer.resize(disk, ptnum, blockdev)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
      return (before, get_size(partdev))
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size
      fd = os.open(filename, os.O_RDONLY)
  FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'

  ========================================

  @rcj suggested this is a race with udev. This seems to only happen on
  Cosmic and later.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions



More information about the foundations-bugs mailing list