[Bug 1503286] Re: ISST-LTE: Boot of Ubuntu15.10 lpar fails: "mounting /dev/sdn2 on /root failed: Device or resource busy" [multipath]

Mauricio Faria de Oliveira mauricfo at linux.vnet.ibm.com
Fri Feb 5 16:22:07 UTC 2016


Verified on 14.04.
Marking verification-done.

As mentioned in the description test-case, the issue is hard to reproduce.
Trying to force it to happen, I modified the updated package local-premount/multipath 
script to remove/rescan the SCSI devices in the background right before the 
udevadm settle command, and could not reproduce the failure.
The system booted successfully.

More assurance is given since the patch is the same as that verified by
the tester on the original environment (comment #12).

 . /scripts/functions
 
+multipath -F
+
+for sd_delete in /sys/block/sd*/device/delete; do
+        echo 1 > $sd_delete
+done
+
+for host_scan in /sys/class/scsi_host/host*/scan; do
+        echo '- - -' > $host_scan
+done &
+
 if [ -x /sbin/multipathd ]
 then
         [ "$quiet" != "y" ] && log_begin_msg "Waiting for udev to settle (multipath)"
         udevadm settle --timeout=121 || true
         [ "$quiet" != "y" ] && log_end_msg
 fi
 
 exit 0

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1503286

Title:
  ISST-LTE: Boot of Ubuntu15.10 lpar fails: "mounting /dev/sdn2 on /root
  failed: Device or resource busy" [multipath]

Status in multipath-tools package in Ubuntu:
  Fix Released
Status in multipath-tools source package in Trusty:
  Fix Committed
Status in multipath-tools source package in Vivid:
  Fix Committed
Status in multipath-tools source package in Wily:
  Fix Released

Bug description:
  [Impact]
  Systems with disks that have long spin-up times or otherwise take a while to be detected may be affected by a failure to boot due to the drives underlying multipath devices not being available.

  [Test case]
  This issue is difficult to reproduce.
   - Boot a system with the boot device on multipath.
  This may be limited to POWER LPARs. See description below.

  [Regression Potential]
  Given that a new initramfs script is introduced to add a udev trigger with a timeout of 2 minutes (121 seconds), users may notice a delay of up to two minutes in booting if devices take 2 minutes or more to be brought up or detected by udev.

  ---

  == Comment: #0 - Manjunatha H R <manjuhr1 at in.ibm.com> - 2015-09-25 11:05:36 ==
  Booting of Ubuntu15.10 lpar fails and control falls to initramfs.

  uname -a
  --------------
  Linux (none) 4.2.0-10-generic #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 ppc64le GNU/Linux

  Boot log:
  -------------
    Booting a command list

  Loading Linux 4.2.0-10-generic ...
  Loading initial ramdisk ...
  OF stdout device is: /vdevice/vty at 30000000
  Preparing to boot Linux version 4.2.0-10-generic (buildd at fisher04) (gcc version 5.2.1 20150911 (Ubuntu 5.2.1-17ubuntu4) ) #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 (Ubuntu 4.2.0-10.12-generic 4.2.0)
  Detected machine type: 0000000000000101
  Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
  Calling ibm,client-architecture-support... done
  command line: BOOT_IMAGE=/boot/vmlinux-4.2.0-10-generic root=UUID=822dd709-5b69-45a9-aba5-63cb55768ffb ro splash quiet topology_updates=off
  memory layout at init:
    memory_limit : 0000000000000000 (16 MB aligned)
    alloc_bottom : 000000000bf80000
    alloc_top    : 0000000010000000
    alloc_top_hi : 0000000010000000
    rmo_top      : 0000000010000000
    ram_top      : 0000000010000000
  found display   : /pci at 80000002000002c/display at 0, opening... done
  instantiating rtas at 0x000000000eb60000... done
  prom_hold_cpus: skipped
  copying OF device tree...
  Building dt strings...
  Building dt structure...
  Device tree strings 0x000000000bf90000 -> 0x000000000bf91965
  Device tree struct  0x000000000bfa0000 -> 0x000000000bfe0000
  Quiescing Open Firmware ...
  Booting Linux via __start() ...
   -> smp_release_cpus()
  spinning_secondaries = 199
   <- smp_release_cpus()
   <- setup_system()
  [    2.868103] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
  [    3.074553] [drm:radeon_atombios_init [radeon]] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO
  [    5.060785] lpfc 0002:90:00.0: 0:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
  Scanning for Btrfs filesystems
  fsck from util-linux 2.26.2
  /dev/sdn2 is in use.
  e2fsck: Cannot continue, aborting.

  fsck exited with status code 8
  [   36.233086]  rport-0:0-9: blocked FC remote port time out: removing rport
  mount: mounting /dev/sdn2 on /root failed: Device or resource busy
  Target filesystem doesn't have requested /sbin/init.
  mount: mounting /dev on /root/dev failed: No such file or directory
  No init found. Try passing init= bootarg.

  BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
  Enter 'help' for a list of built-in commands.

  (initramfs)
  -------------------------

  This lpar is having multipath disks and boot disk is on a multipath disk.
  Boot passes only whenever fsck tries to scan boot disk via :  /dev/dm OR /dev/mapper/mpath

  Boot Pass scenarios:
  ----------------------------
  1. Boot passed when fsck tried scanning "/dev/mapper/mpathb"
  fsck from util-linux 2.26.2
  /dev/mapper/mpathb-part2: clean, 81802/3139584 files, 1040598/12558080 blocks

  2. Boot passed when fsck tried scanning  "/dev/dm-3"
  Scanning for Btrfs filesystems
  fsck from util-linux 2.26.2
  /dev/dm-3: clean, 81802/3139584 files, 1040605/12558080 blocks

  Boot fails, whenever fsck is called on /dev/sd

  Boot fail scenario: Boot failed when fsck is called on "/dev/sdn"
  -------------------------
  Scanning for Btrfs filesystems
  fsck from util-linux 2.26.2
  /dev/sdn2 is in use.
  e2fsck: Cannot continue, aborting.

  fsck exited with status code 8
  [   36.108653]  rport-0:0-9: blocked FC remote port time out: removing rport
  mount: mounting /dev/sdn2 on /root failed: Device or resource busy
  Target filesystem doesn't have requested /sbin/init.

  mount: mounting /dev on /root/dev failed: No such file or directory
  No init found. Try passing init= bootarg.

  BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
  Enter 'help' for a list of built-in commands.

  (initramfs)
  -------------------------

  Contact info:
  ----------------
  Manju (manjuhr1 at in.ibm.com)      A.P. (apundt at us.ibm.com)

  == Comment: #12 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-02 13:51:29 ==
  Hi Manju and Alton,

  I could not reproduce this bug in 2 attempts.
  The LPAR booted successfully, using the root=UUID= parameter.

  By looking at this message from the description:
  > mount: mounting /dev/sdn2 on /root failed: Device or resource busy

  It should have happened because multipath udev rules failed to update
  the /dev/disk/by-id/<uuid> symlink from /dev/sdn to /dev/dm-X, but
  multipathing the path was successful (so it got locked/in-use).

  If you can reproduce it again, please leave the LPAR in the failing state (in the initramfs), reopen this bug and ping me.
  I'd be happy to debug it.

  Thanks!

  == Comment: #15 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-05 19:59:56 ==
  This is probably a race between the resolve_device() call in mountroot() and the multipath discovery triggered by udev rules.

  If resolve_device() runs before the root device is multipathed, $ROOT is set to an individual path (eg, /dev/sdf2) rather than its multipah device (eg, /dev/mapper/mpathb-part2), because the /dev/disk/by-uuid/<UUID> symlink is not updated yet.
  The multipath discovery finishes after $ROOT is set, so the individual path becomes locked, and afterwards the root mount will be attempted on it -- this fails.

  The LPAR is now patched w/ a test fix that is supposed to ensure
  resolve_device() only starts after udev rules are finished.

  Can you try to recreate the issue, please? Thanks!

  == Comment: #16 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-05 20:05:37 ==
  Console messages

   (!) Note: the local-premount messages (Running ... & done.) occur
  around SCSI device scan/discovery time.

   ...
   Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [    5.113397] device-mapper: multipath: version 1.9.0 loaded
   Success: loaded module dm-multipath.
   Failure: failed to load module dm-emc.
   done.
   Begin: Discovering multipaths ... done.
   done.
   Begin: Running /scripts/local-premount ... [    5.187071] scsi 0:0:0:0: Direct-Access     IBM      2107900          .850 PQ: 0 ANSI: 5
   [    5.195814] sd 0:0:0:0: Attached scsi generic sg0 type 0
   [    5.203289] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
   ...
   [    5.878046] sd 0:0:3:3: [sdp] Attached SCSI disk
   ...
   <10-20 SCSI disks via FC>
   ...
   [    5.923577] device-mapper: multipath round-robin: version 1.0.0 loaded
   ...
   done.
   ...

  If resolve_device() runs before the multipath udev rules
  (the rules multipath the root device and update the /dev/disk/by-uuid symlink of $ROOT)
  this happens:

   Begin: Checking root file system ... fsck from util-linux 2.26.2
   /dev/sdf2 is in use.
   e2fsck: Cannot continue, aborting.

   fsck exited with status code 8
   done.
   Warning: File system check failed but did not detect errors

   mount: mounting /dev/sdf2 on /root failed: Device or resource busy
   done.
   Target filesystem doesn't have requested /sbin/init.
   Begin: Running /scripts/local-bottom ... done.
   Begin: Running /scripts/init-bottom ...
   ...
   mount: mounting /dev on /root/dev failed: No such file or directory
   done.
   No init found. Try passing init= bootarg.
   ...
   (initramfs)

  So, /dev/sdf2 is in use ... and hits Device or resource busy.
  This comes from $ROOT.
  However, the root=UUID= symlink points to the multipath device:

   (initramfs) echo $ROOT
   /dev/sdf2

   (initramfs) cat /proc/cmdline
   BOOT_IMAGE=/boot/vmlinux-4.2.0-12-generic root=UUID=44bd8a6e-8613-431a-9335-879d8cf5d0e4 ro

   (initramfs) ls -l /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4
   lrwxrwxrwx    1        11 /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4 -> ../../dm-19

   (initramfs) ls -l /dev/sdf
   brw-------    1    8,  80 /dev/sdf

   (initramfs) dmsetup table | grep 8:80
   mpathb: 0 104857600 multipath 0 0 1 1 round-robin 0 4 1 8:80 1 8:16 1 8:144 1 8:208 1

  It's probably because resolve_device() was racing w/ the multipath discoveries from udev rules.
  resolve_device() finished before the /dev/disk/by-uuid/ symlink was updated by multipath discovery.

  Code:

  initramfs :: /init

   log_begin_msg "Mounting root file system"
   ...
   mountroot
   log_end_msg

   (!) Note: message "Mounting root fs" and call to mountroot()

  initramfs :: /scripts/local

   mountroot()
   {
    local_mount_root
   }

   local_mount_root()
   {
    ...
    local_premount

    ROOT=$(resolve_device "$ROOT")
    ...
   }

   (!) Note: local_premount() is the last call before resolve_device()
  is called

   local_premount()
   {
    ...
     [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount"
     run_scripts /scripts/local-premount
     [ "$quiet" != "y" ] && log_end_msg
    ...
   }

  So we're testing a call to 'udevadm settle' in /scripts/local-
  premount/multipath script.

  == Comment: #17 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> -
  2015-10-05 20:08:26 ==

  == Comment: #18 - Manjunatha H R <manjuhr1 at in.ibm.com> - 2015-10-06 06:05:30 ==
  Thank you Mauricio for a quick fix!!
  Lpar is booting up properly without seeing device or resource busy errors.

  == Comment: #20 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-06 09:04:50 ==
  Confirmed w/ Manju the # of tests.

  10:00:41 AM: Manjunatha H R: Hi Mauricio, I tried around 10 boots..
  10:01:07 AM: Manjunatha H R: all times it booted up..

  Sounds good.

  I'll be sending a patch/mirroring.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1503286/+subscriptions



More information about the foundations-bugs mailing list