[Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

Mon Jan 22 20:10:23 UTC 2024

I ran into this on jammy/amd64:
https://autopkgtest.ubuntu.com/results/autopkgtest-
jammy/jammy/amd64/l/livecd-rootfs/20240121_173406_e4f9a@/log.gz

I downloaded all of the amd64 failures and searched for this failure
pattern. These were the kernels that were running at the time:

"Linux 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023"
"Linux 6.2.0-21-generic #21-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 14 12:34:02 UTC 2023"
"Linux 6.3.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun  8 16:02:30 UTC 2023"
"Linux 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct  7 01:35:40 UTC 2023"
"Linux 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023"

Here's the count of failures per image type:
     12 017-disk-image-uefi.binary
      3 018-disk-image.binary
      3 020-kvm-image.binary
      1 023-vagrant.binary
      1 024-vagrant.binary

I can confirm that /dev/loop0p1 is created by devtmpfs. This surprised
me because I'd never actually need to know what devtmpfs was, and I saw
devices being created even though I had SIGSTOP'd systemd-udevd. But
watching udevadm monitor and forktrace output convinced me.

I had a theory that something was opening the first created partition before all partitions were created. loop_reread_partitions() can fail without returning an error to userspace:
 https://elixir.bootlin.com/linux/v5.15.147/source/drivers/block/loop.c#L676

that could happen if bdev_disk_changed() aborts because it finds another partition on the device is open:
 https://elixir.bootlin.com/linux/v5.15.147/source/block/partitions/core.c#L662

But then we should see this in dmesg:
  pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n"

I added dmesg calls to check that:
https://autopkgtest.ubuntu.com/results/autopkgtest-jammy-dannf-test/jammy/amd64/l/livecd-rootfs/20240122_161631_62ecd@/log.gz

.. but no such message appeared, so that's not it. But what *is*
interesting there is that it shows *2* partition scan lines:

1248s [  990.855361] loop0: detected capacity change from 0 to 4612096
1248s [  990.855628]  loop0: p1 p14 p15
1248s [  990.874241]  loop0: p1 p14 p15

Previously we just saw 1:

1189s [  932.268459] loop0: detected capacity change from 0 to 4612096
1189s [  932.268715]  loop0: p1 p14 p15

That only gets printed when bdev_disk_changed() is called. So do we have
2 racing callers?

One thing that seems off is that loop_configure() unsuppresses uevents
for the full device before the partition scan, but loop_change_fd()
waits until the partition scan is complete. Shouldn't they be following
the same pattern? I wonder if that could cause the following race:

[livecd-rootfs] losetup creates /dev/loop0
[livecd-rootfs] kernel sends uevent for /dev/loop0
[livecd-rootfs] /dev/loop0p* appear in devtmpfs
      [udev] receives uevent for loop0
      [udev] partprobe /dev/loop0
[livecd-rootfs] losetup exit(0)
      [partprobe] /dev/loop0p* cleared
[livecd-rootfs] check for /dev/loop0p1 FAILS
      [partprobe] /dev/loop0p* recreated

I tried checking for this using ftrace in a local jammy VM. I haven't been able to reproduce this in a local VM, but I wanted to see what happens in a normal losetup.. er... setup.

>>>>> First I used losetup to create the device:

root at dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# loopdev="$(losetup --show -f -P -v /home/ubuntu/disk.img)"
root at dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 1/1   #P:1
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
         losetup-1996    [000] .....   657.573994: bdev_disk_changed <-loop_reread_partitions

>>>>> Only the expected bdev_disk_change() call
>>>>> Then I remove the device:

root at dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# losetup -v -d $loopdev
root at dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 3/3   #P:1
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
         losetup-1996    [000] .....   657.573994: bdev_disk_changed <-loop_reread_partitions
   systemd-udevd-2524    [000] .....   680.555336: bdev_disk_changed <-blkdev_get_whole
         losetup-2523    [000] .....   680.568473: bdev_disk_changed <-__loop_clr_fd

>>>>> udev did rescan the partitions... but only during removal. And its
strange that it gets in there before losetup calls it itself.

Perhaps I should add this tracing into a debug livecd-rootfs and see if
I can reproduce in our autopkgtest infra.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

    https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

    https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions