[Bug 1503286] Re: ISST-LTE: Boot of Ubuntu15.10 lpar fails: "mounting /dev/sdn2 on /root failed: Device or resource busy" [multipath]

Launchpad Bug Tracker 1503286 at bugs.launchpad.net
Mon Feb 15 23:15:28 UTC 2016


This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.9

---------------
multipath-tools (0.4.9-3ubuntu7.9) trusty; urgency=medium

  * debian/patches/kpartx-support-device-names-with-spaces.patch: fix loopback
    files unmapping. (LP: #1543430)

multipath-tools (0.4.9-3ubuntu7.8) trusty; urgency=medium

  * debian/patches/kpartx-support-device-names-with-spaces.patch: deal with
    spaces in device names in kpartx too (LP: #1432062)
  * debian/initramfs/local-premount: wait for udev to settle before mounting
    so the by-uuid/ symlinks have a chance to be updated by udev rules.
    (LP: #1503286)
  * Allow device detection all through the initramfs: run multipathd instead
    of only scanning once for devices, so those that come up slower can still
    be used as a root device (LP: #1526984):
    - debian/patches/0050-readonly-bindings_prefix.patch,
      debian/patches/0051-readonly-bindings_multipath.patch,
      debian/patches/0052-readonly-bindings_multipathd.patch,
      debian/patches/0053-readonly-bindings_multipathd_prod.patch: support -B
      to allow multipathd to handle cases where the bindings file is read-only.
    - debian/initramfs/hooks: install multipathd and required directories.
    - debian/initramfs/local-premount: also reload all maps to make sure
      they're ready before we mount.
    - debian/initramfs/local-top: run multipathd rather than a one-off call to
      multipath so that new paths can be correctly added as detected while
      we're still in the initramfs.
    - debian/initramfs/local-bottom: remember to stop multipathd.
    - debian/initramfs/local-bottom, debian/rules: install local-bottom for
      initramfs.
  * debian/patches/lp1496210_add_IBM_XIV_defaults.patch: add support (default
    config values) for the IBM 2810XIV storage system. (LP: #1496210)
  * debian/patches/0054-kpartx-update-option.patch: run kpartx -u rather than
    kpartx -a, so as to remove old partition entries if the partition table
    has changed. (LP: #1473903)
  * debian/patches/multipath_enable_sync_support_1b8082c8.patch,
    debian/patches/kpartx_rely_on_udev_dev_creation_9a632fff.patch: synchronize
    udev, device-mapper and multipath, and let udev deal with creating device
    nodes and symlinks. (LP: #1486370)
  * debian/initramfs/local-top: drop scsi_wait_scan stanza, that module is no
    longer available. (LP: #1538775)

 -- Mathieu Trudel-Lapierre <mathieu-tl at ubuntu.com>  Tue, 09 Feb 2016
16:03:10 -0500

** Changed in: multipath-tools (Ubuntu Trusty)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1503286

Title:
  ISST-LTE: Boot of Ubuntu15.10 lpar fails: "mounting /dev/sdn2 on /root
  failed: Device or resource busy" [multipath]

Status in multipath-tools package in Ubuntu:
  Fix Released
Status in multipath-tools source package in Trusty:
  Fix Released
Status in multipath-tools source package in Vivid:
  Fix Committed
Status in multipath-tools source package in Wily:
  Fix Released

Bug description:
  [Impact]
  Systems with disks that have long spin-up times or otherwise take a while to be detected may be affected by a failure to boot due to the drives underlying multipath devices not being available.

  [Test case]
  This issue is difficult to reproduce.
   - Boot a system with the boot device on multipath.
  This may be limited to POWER LPARs. See description below.

  [Regression Potential]
  Given that a new initramfs script is introduced to add a udev trigger with a timeout of 2 minutes (121 seconds), users may notice a delay of up to two minutes in booting if devices take 2 minutes or more to be brought up or detected by udev.

  ---

  == Comment: #0 - Manjunatha H R <manjuhr1 at in.ibm.com> - 2015-09-25 11:05:36 ==
  Booting of Ubuntu15.10 lpar fails and control falls to initramfs.

  uname -a
  --------------
  Linux (none) 4.2.0-10-generic #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 ppc64le GNU/Linux

  Boot log:
  -------------
    Booting a command list

  Loading Linux 4.2.0-10-generic ...
  Loading initial ramdisk ...
  OF stdout device is: /vdevice/vty at 30000000
  Preparing to boot Linux version 4.2.0-10-generic (buildd at fisher04) (gcc version 5.2.1 20150911 (Ubuntu 5.2.1-17ubuntu4) ) #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 (Ubuntu 4.2.0-10.12-generic 4.2.0)
  Detected machine type: 0000000000000101
  Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
  Calling ibm,client-architecture-support... done
  command line: BOOT_IMAGE=/boot/vmlinux-4.2.0-10-generic root=UUID=822dd709-5b69-45a9-aba5-63cb55768ffb ro splash quiet topology_updates=off
  memory layout at init:
    memory_limit : 0000000000000000 (16 MB aligned)
    alloc_bottom : 000000000bf80000
    alloc_top    : 0000000010000000
    alloc_top_hi : 0000000010000000
    rmo_top      : 0000000010000000
    ram_top      : 0000000010000000
  found display   : /pci at 80000002000002c/display at 0, opening... done
  instantiating rtas at 0x000000000eb60000... done
  prom_hold_cpus: skipped
  copying OF device tree...
  Building dt strings...
  Building dt structure...
  Device tree strings 0x000000000bf90000 -> 0x000000000bf91965
  Device tree struct  0x000000000bfa0000 -> 0x000000000bfe0000
  Quiescing Open Firmware ...
  Booting Linux via __start() ...
   -> smp_release_cpus()
  spinning_secondaries = 199
   <- smp_release_cpus()
   <- setup_system()
  [    2.868103] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
  [    3.074553] [drm:radeon_atombios_init [radeon]] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO
  [    5.060785] lpfc 0002:90:00.0: 0:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
  Scanning for Btrfs filesystems
  fsck from util-linux 2.26.2
  /dev/sdn2 is in use.
  e2fsck: Cannot continue, aborting.

  fsck exited with status code 8
  [   36.233086]  rport-0:0-9: blocked FC remote port time out: removing rport
  mount: mounting /dev/sdn2 on /root failed: Device or resource busy
  Target filesystem doesn't have requested /sbin/init.
  mount: mounting /dev on /root/dev failed: No such file or directory
  No init found. Try passing init= bootarg.

  BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
  Enter 'help' for a list of built-in commands.

  (initramfs)
  -------------------------

  This lpar is having multipath disks and boot disk is on a multipath disk.
  Boot passes only whenever fsck tries to scan boot disk via :  /dev/dm OR /dev/mapper/mpath

  Boot Pass scenarios:
  ----------------------------
  1. Boot passed when fsck tried scanning "/dev/mapper/mpathb"
  fsck from util-linux 2.26.2
  /dev/mapper/mpathb-part2: clean, 81802/3139584 files, 1040598/12558080 blocks

  2. Boot passed when fsck tried scanning  "/dev/dm-3"
  Scanning for Btrfs filesystems
  fsck from util-linux 2.26.2
  /dev/dm-3: clean, 81802/3139584 files, 1040605/12558080 blocks

  Boot fails, whenever fsck is called on /dev/sd

  Boot fail scenario: Boot failed when fsck is called on "/dev/sdn"
  -------------------------
  Scanning for Btrfs filesystems
  fsck from util-linux 2.26.2
  /dev/sdn2 is in use.
  e2fsck: Cannot continue, aborting.

  fsck exited with status code 8
  [   36.108653]  rport-0:0-9: blocked FC remote port time out: removing rport
  mount: mounting /dev/sdn2 on /root failed: Device or resource busy
  Target filesystem doesn't have requested /sbin/init.

  mount: mounting /dev on /root/dev failed: No such file or directory
  No init found. Try passing init= bootarg.

  BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
  Enter 'help' for a list of built-in commands.

  (initramfs)
  -------------------------

  Contact info:
  ----------------
  Manju (manjuhr1 at in.ibm.com)      A.P. (apundt at us.ibm.com)

  == Comment: #12 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-02 13:51:29 ==
  Hi Manju and Alton,

  I could not reproduce this bug in 2 attempts.
  The LPAR booted successfully, using the root=UUID= parameter.

  By looking at this message from the description:
  > mount: mounting /dev/sdn2 on /root failed: Device or resource busy

  It should have happened because multipath udev rules failed to update
  the /dev/disk/by-id/<uuid> symlink from /dev/sdn to /dev/dm-X, but
  multipathing the path was successful (so it got locked/in-use).

  If you can reproduce it again, please leave the LPAR in the failing state (in the initramfs), reopen this bug and ping me.
  I'd be happy to debug it.

  Thanks!

  == Comment: #15 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-05 19:59:56 ==
  This is probably a race between the resolve_device() call in mountroot() and the multipath discovery triggered by udev rules.

  If resolve_device() runs before the root device is multipathed, $ROOT is set to an individual path (eg, /dev/sdf2) rather than its multipah device (eg, /dev/mapper/mpathb-part2), because the /dev/disk/by-uuid/<UUID> symlink is not updated yet.
  The multipath discovery finishes after $ROOT is set, so the individual path becomes locked, and afterwards the root mount will be attempted on it -- this fails.

  The LPAR is now patched w/ a test fix that is supposed to ensure
  resolve_device() only starts after udev rules are finished.

  Can you try to recreate the issue, please? Thanks!

  == Comment: #16 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-05 20:05:37 ==
  Console messages

   (!) Note: the local-premount messages (Running ... & done.) occur
  around SCSI device scan/discovery time.

   ...
   Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [    5.113397] device-mapper: multipath: version 1.9.0 loaded
   Success: loaded module dm-multipath.
   Failure: failed to load module dm-emc.
   done.
   Begin: Discovering multipaths ... done.
   done.
   Begin: Running /scripts/local-premount ... [    5.187071] scsi 0:0:0:0: Direct-Access     IBM      2107900          .850 PQ: 0 ANSI: 5
   [    5.195814] sd 0:0:0:0: Attached scsi generic sg0 type 0
   [    5.203289] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
   ...
   [    5.878046] sd 0:0:3:3: [sdp] Attached SCSI disk
   ...
   <10-20 SCSI disks via FC>
   ...
   [    5.923577] device-mapper: multipath round-robin: version 1.0.0 loaded
   ...
   done.
   ...

  If resolve_device() runs before the multipath udev rules
  (the rules multipath the root device and update the /dev/disk/by-uuid symlink of $ROOT)
  this happens:

   Begin: Checking root file system ... fsck from util-linux 2.26.2
   /dev/sdf2 is in use.
   e2fsck: Cannot continue, aborting.

   fsck exited with status code 8
   done.
   Warning: File system check failed but did not detect errors

   mount: mounting /dev/sdf2 on /root failed: Device or resource busy
   done.
   Target filesystem doesn't have requested /sbin/init.
   Begin: Running /scripts/local-bottom ... done.
   Begin: Running /scripts/init-bottom ...
   ...
   mount: mounting /dev on /root/dev failed: No such file or directory
   done.
   No init found. Try passing init= bootarg.
   ...
   (initramfs)

  So, /dev/sdf2 is in use ... and hits Device or resource busy.
  This comes from $ROOT.
  However, the root=UUID= symlink points to the multipath device:

   (initramfs) echo $ROOT
   /dev/sdf2

   (initramfs) cat /proc/cmdline
   BOOT_IMAGE=/boot/vmlinux-4.2.0-12-generic root=UUID=44bd8a6e-8613-431a-9335-879d8cf5d0e4 ro

   (initramfs) ls -l /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4
   lrwxrwxrwx    1        11 /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4 -> ../../dm-19

   (initramfs) ls -l /dev/sdf
   brw-------    1    8,  80 /dev/sdf

   (initramfs) dmsetup table | grep 8:80
   mpathb: 0 104857600 multipath 0 0 1 1 round-robin 0 4 1 8:80 1 8:16 1 8:144 1 8:208 1

  It's probably because resolve_device() was racing w/ the multipath discoveries from udev rules.
  resolve_device() finished before the /dev/disk/by-uuid/ symlink was updated by multipath discovery.

  Code:

  initramfs :: /init

   log_begin_msg "Mounting root file system"
   ...
   mountroot
   log_end_msg

   (!) Note: message "Mounting root fs" and call to mountroot()

  initramfs :: /scripts/local

   mountroot()
   {
    local_mount_root
   }

   local_mount_root()
   {
    ...
    local_premount

    ROOT=$(resolve_device "$ROOT")
    ...
   }

   (!) Note: local_premount() is the last call before resolve_device()
  is called

   local_premount()
   {
    ...
     [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount"
     run_scripts /scripts/local-premount
     [ "$quiet" != "y" ] && log_end_msg
    ...
   }

  So we're testing a call to 'udevadm settle' in /scripts/local-
  premount/multipath script.

  == Comment: #17 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> -
  2015-10-05 20:08:26 ==

  == Comment: #18 - Manjunatha H R <manjuhr1 at in.ibm.com> - 2015-10-06 06:05:30 ==
  Thank you Mauricio for a quick fix!!
  Lpar is booting up properly without seeing device or resource busy errors.

  == Comment: #20 - Mauricio Faria De Oliveira <mauricfo at br.ibm.com> - 2015-10-06 09:04:50 ==
  Confirmed w/ Manju the # of tests.

  10:00:41 AM: Manjunatha H R: Hi Mauricio, I tried around 10 boots..
  10:01:07 AM: Manjunatha H R: all times it booted up..

  Sounds good.

  I'll be sending a patch/mirroring.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1503286/+subscriptions



More information about the foundations-bugs mailing list