[Bug 1431650] Re: Multipath devices take long to initialize during initramfs

Mauricio Faria de Oliveira mauricfo at linux.vnet.ibm.com
Fri Mar 20 19:03:44 UTC 2015


> Well, not necessarily. The swap issue is the same as the rootfs one

Ah, yes, you're right.

I was thinking with the assumption that the workaround/patch for the timeout was in place.
Indeed, if it's not in place, there'll be no swap nor anything multipath'ed at all.

In the particular case the workaround/patch is applied, what I've seen
is a job waiting/timing out for the swap device (now) incorrectly
specified in /etc/fstab.

Thanks for pointing that out.

> The debdiff looks quite good; I'll add in the removal of
scsi_wait_device which is just useless (that module no longer exists).

Cool :) thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1431650

Title:
  Multipath devices take long to initialize during initramfs

Status in multipath-tools package in Ubuntu:
  In Progress

Bug description:
  Problem: during initramfs, multipath devices takes long to initialize
  (~30 seconds?)

  Symptom:
  	<...>
  	Loading, please wait...
  	<...>
  	[    4.492303] device-mapper: multipath: version 1.7.0 loaded
  	[    4.643053] random: nonblocking pool is initialized
  	worker [116] /devices/vio/71000002/host0/target0:0:0/0:0:0:0/block/sda is taking a long time
  	worker [117] /devices/vio/71000002/host0/target0:0:2/0:0:2:0/block/sdc is taking a long time
  	worker [121] /devices/vio/71000002/host0/target0:0:1/0:0:1:0/block/sdb is taking a long time
  	worker [123] /devices/vio/71000002/host0/target0:0:3/0:0:3:0/block/sdd is taking a long time
  	<... here's the delay ...>
  	timeout '/sbin/multipath -v0 /dev/sda'
  	slow: '/sbin/multipath -v0 /dev/sda' [181]
  	timeout '/sbin/multipath -v0 /dev/sdb'
  	slow: '/sbin/multipath -v0 /dev/sdb' [178]
  	timeout '/sbin/multipath -v0 /dev/sdc'
  	slow: '/sbin/multipath -v0 /dev/sdc' [180]
  	timeout '/sbin/multipath -v0 /dev/sdd'
  	slow: '/sbin/multipath -v0 /dev/sdd' [182]
  	worker [116] /devices/vio/71000002/host0/target0:0:0/0:0:0:0/block/sda timeout; kill it
  	seq 1059 '/devices/vio/71000002/host0/target0:0:0/0:0:0:0/block/sda' killed
  	worker [117] /devices/vio/71000002/host0/target0:0:2/0:0:2:0/block/sdc timeout; kill it
  	seq 1079 '/devices/vio/71000002/host0/target0:0:2/0:0:2:0/block/sdc' killed
  	worker [121] /devices/vio/71000002/host0/target0:0:1/0:0:1:0/block/sdb timeout; kill it
  	seq 1069 '/devices/vio/71000002/host0/target0:0:1/0:0:1:0/block/sdb' killed
  	worker [123] /devices/vio/71000002/host0/target0:0:3/0:0:3:0/block/sdd timeout; kill it
  	seq 1089 '/devices/vio/71000002/host0/target0:0:3/0:0:3:0/block/sdd' killed
  	worker [116] terminated by signal 9 (Killed)
  	worker [117] terminated by signal 9 (Killed)
  	worker [121] terminated by signal 9 (Killed)
  	worker [123] terminated by signal 9 (Killed)
  	<...>
  	<login prompt>

  Reason:
  	When udev starts to handle the block device, it acquires a lock on it, then processes the rules.
  	Therefore, by the time 'multipath' runs (called in udev rules), the block device is already locked.

  	For 'multipath' to finish, it must acquire a *exclusive* lock on the block device.
  	Thus it fails (exclusive mode doesn't succeed with another lock in place).

  	So it loops, trying again until the lock is acquired.
  	(messages: "failed to create map (in use)" / "restart multipath configuration process").

          Sure enough, it will keep looping.. until killed by time-out
  (message: "timeout '/sbin/multipath -v0 /dev/sda'").

          Only then (after rule-processing) udev releases the lock.

  	The next call to the 'multipath' command succeeds (out of udev rules, in /scripts/local-top/multipath).
  	This happens because, by that time, udev has finished handling all devices (i.e., 'udev settle' in /scripts/init-bottom/udev, which is run earlier),
  	so there's no more the udev worker lock to the underlying block device(s).

  Steps to reproduce:
  	Do a multipath install (disk-detect/multipath/enable=true), and boot.

  Solution:

  	left as an exercise for the reader :)
  	
  	I guess the udev 95-multipath.rules shouldn't be in the initramfs.
  	(i.e., add_udev_rules() in /usr/share/initramfs-tools/hooks from 
  	 multipath-tools-boot)

  	Booting on multipath devices would still be possible because there's
  	another call to 'multipath' on initramfs/scripts/local-top/multipath,
  	before mounting root.

  	Not sure about the behavior during system operation; e.g., hotplug
  	a new disk.. the multipath command would be called from the udev rule
  	in the root filesystem now (i.e., /lib/udev/rules.d/ from rootfs, not
  	initramfs).  I'm now sure of how multipathd is related to that now.
  	
  		

  Analysis:

          Checking out why '/sbin/multipath -v0 /dev/sda' is timing out.

  	Boot with: 
  		break=top,pre-multipath,post-multipath

  		<...>
  		Spawning shell within the initramfs
  		<...>
  		(initramfs)

  	Increase verbosity on multipath and udevd, and show udevd
  	And, to save time/output, reduce the udev timeout, and restrict that command 
  	to only a single disk (e.g., sda).

  		udevd:
  	
  		(initramfs) grep udevd /scripts/init-top/udev 
  		/lib/systemd/systemd-udevd --daemon --resolve-names=never

                  (initramfs) sed '/systemd-udevd/ { s:--daemon:--debug
  --event-timeout=5: ; s:$: 2>\&1 \&: }' -i /scripts/init-top/udev

  		(initramfs) grep udevd /scripts/init-top/udev 
  		/lib/systemd/systemd-udevd --debug --event-timeout=5 --resolve-names=never 2>&1 &

                  multipath:

  		(initramfs) grep -F 'multipath -v' /lib/udev/rules.d/95-multipath.rules 
  		ACTION=="add|change", SUBSYSTEM=="block", RUN+="/sbin/multipath -v0 /dev/$name"

                  (initramfs) sed '/multipath -v/ { s:-v0:-v3: ;
  s:^:KERNEL=="sda", : }' -i /lib/udev/rules.d/95-multipath.rules

  		(initramfs) grep -F 'multipath -v' /lib/udev/rules.d/95-multipath.rules 
  		KERNEL=="sda", ACTION=="add|change", SUBSYSTEM=="block", RUN+="/sbin/multipath -v3 /dev/$name"
  	
  	Let initramfs continue:

                  (initramfs) exit

  	The udevd output rolls out. 
  	The output of the multipath command is looping.  Looking closer, it's always
  	restarting the multipath configuration process.  At the end of each iteration
  	(it's looping..), there is this block:

  		'/sbin/multipath -v3 /dev/sda'(out) 'Mar 12 13:44:48 | mpath0: set ACT_CREATE (map does not exist)'
  		'/sbin/multipath -v3 /dev/sda'(out) 'Mar 12 13:44:48 | mpath0: failed to create map (in use)'
  		'/sbin/multipath -v3 /dev/sda'(out) 'Mar 12 13:44:48 | mpath0: domap (4294967295) failure for create/reload map'
  		'/sbin/multipath -v3 /dev/sda'(out) 'Mar 12 13:44:48 | restart multipath configuration process'

          Searching multipath-tools source package for what causes that
  error message "failed to create map (in use)".

  		# apt-get source multipath-tools
  		# cd multipath-tools-*
  		# grep -r 'failed to create map (in use)' .
  		./libmultipath/configure.c:                     condlog(3, "%s: failed to create map (in use)",

          And there it is, in domap() (also mentioned in the error):

  		extern int
  		domap (struct multipath * mpp)
  		{
  		...
  			case ACT_CREATE:
  				if (lock_multipath(mpp, 1)) {
  				        condlog(3, "%s: failed to create map (in use)",
  				                mpp->alias);
  				        return DOMAP_RETRY;
  				}

          The error thus comes from lock_multipath(), in the same file:

  		static int
  		lock_multipath (struct multipath * mpp, int lock)
  		{
  		...
  				        if (lock && flock(pp->fd, LOCK_EX | LOCK_NB) &&
  				            errno == EWOULDBLOCK)
  				                return 1;

          So, basically, it's trying to lock a file w/ exclusive access
  flock(fd, LOCK_EX).

          To check what file is that, on another system, which is booted
  already, let's use strace:

  		# strace -f strace.multipath multipath -v0 /dev/sda
  		<...>
  		26362 open("/dev/sda", O_RDONLY)        = 5
  		<...>
  		26362 flock(5, LOCK_EX|LOCK_NB)         = 0
  		<...>

          OK; it tries to lock the same device that is passed as
  argument.

  	In order to discover the owner of the lock in the initramfs... hm..
  	there's no lsof or something.  
  	
  	Let's use a script to check every process's open files
  	(if a file is locked, it must be open; e.g., flock operates on an open file descriptor).
  	
  	Reboot to that point again, in the initramfs (i.e., break=..., sed ...; NOT exit).

  		break=top,pre-multipath,post-multipath
  		(initramfs) sed '/systemd-udevd/ { s:--daemon:--debug --event-timeout=5: ; s:$: 2>\&1 \&: }' -i /scripts/init-top/udev
  		(initramfs) sed '/multipath -v/ { s:-v0:-v3: ;  s:^:KERNEL=="sda", : }' -i /lib/udev/rules.d/95-multipath.rules
  	
  	But now *REPLACE* the multipath command to a script which searches for an 
  	open fd for the device specified (expect sda, per the restriction inserted),
  	and list some details of the owning process.
  	
  	note: we replace (not append) just to avoid the flock from multipath,
  	      for not to confuse us.

  		(initramfs) sed 's:/sbin/multipath -v. /dev/:/find.sh :' -i /lib/udev/rules.d/95-multipath.rules
  		(initramfs) grep -F 'find.sh' /lib/udev/rules.d/95-multipath.rules 
  		KERNEL=="sda", ACTION=="add|change", SUBSYSTEM=="block", RUN+="/find.sh $name"

  		(initramfs) cat <<EOF >/find.sh
  		#!/bin/sh
  		#set -x
  		for dir in /proc/*
  		do
  		    [ -d "\$dir/fd" ] || continue
  		    ls -l \$dir/fd | grep -q \$1 || continue

  		    echo "FD FOUND"
  		    echo "process: \$dir"
  		    ls -l \$dir/exe
  		    echo "cmdline: \$(cat \$dir/cmdline | tr '\0' ' ')"
  		    echo "file descriptors:"
  		    ls -l \$dir/fd
  		done
  		EOF

  	OK; let's go for udev.
  	
  	(initramfs) exit
  	
  	Lots of udev output..
  	Searching back for the scripts' output (find.sh).. Tip: use 'screen'.

  	Got it.
  	The lock is from udevd.  
  	See fd 6.

  		<...>
  		'/find.sh sda'(out) 'FD FOUND'
  		'/find.sh sda'(out) 'process: /proc/369'
  		'/find.sh sda'(out) 'lrwxrwxrwx    1         0 /proc/369/exe -> /lib/systemd/systemd-udevd'
  		'/find.sh sda'(out) 'cmdline: /lib/systemd/systemd-udevd --debug --event-timeout=5 --resolve-names=never '
  		'/find.sh sda'(out) 'file descriptors:'
  		'/find.sh sda'(out) 'lr-x------    1        64 11 -> pipe:[8569]'
  		'/find.sh sda'(out) 'lrwx------    1        64 10 -> socket:[8352]'
  		'/find.sh sda'(out) 'lrwx------    1        64 9 -> anon_inode:[eventpoll]'
  		'/find.sh sda'(out) 'lrwx------    1        64 8 -> socket:[8351]'
  		'/find.sh sda'(out) 'lr-x------    1        64 7 -> pipe:[8568]'
  		'/find.sh sda'(out) 'lr-x------    1        64 6 -> /dev/sda'
  		'/find.sh sda'(out) 'lr-x------    1        64 5 -> anon_inode:inotify'
  		'/find.sh sda'(out) 'lrwx------    1        64 4 -> anon_inode:[eventpoll]'
  		'/find.sh sda'(out) 'lrwx------    1        64 3 -> anon_inode:[signalfd]'
  		'/find.sh sda'(out) 'lrwx------    1        64 2 -> /dev/console'
  		'/find.sh sda'(out) 'lrwx------    1        64 1 -> /dev/console'
  		'/find.sh sda'(out) 'lr-x------    1        64 0 -> /dev/null'
  		'/find.sh sda' [422] exit with return code 0

          And if you search backwards for that PID, it reaches the fork
  of a udev worker:

                  seq 873 forked new worker [369]

          Looking at udevd code for a worker:

  		# apt-get source systemd
  		# cd systemd-*
  		# grep -r worker .
  		  hmm.. udevd.c.. ah, "worker_new".. looking.
  		  
  		static void worker_new(struct event *event) {
  			<...>
  				fd_lock = open(udev_device_get_devnode(d), O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
  				if (fd_lock >= 0 && flock(fd_lock, LOCK_SH|LOCK_NB) < 0) {
  			<...>
  	
  			        /* apply rules, create node, symlinks */
  			        udev_event_execute_rules(udev_event,		
  			<...>
  			        safe_close(fd_lock);
  		}
  	
  	And there's a very meaningful comment above it:

                  /*
                   * Take a shared lock on the device node; this establishes
                   * a concept of device "ownership" to serialize device
                   * access. External processes holding an exclusive lock will
                   * cause udev to skip the event handling; in the case udev
                   * acquired the lock, the external process can block until
                   * udev has finished its event handling.
                   */

  	Specially:
  	
  		in the case udev
                   * acquired the lock, the external process can block until
                   * udev has finished its event handling.

  	AH.
  	So, that explains it all.

  	See:
  	1) udevd/worker acquires a lock of the block device (shared access),
  	2) process the udev rules (e.g., 95-multipath.rules)
  	3) *afterwards*, release the lock.

  	So, the 'multipath' command that is run inside the udev rule
  	that handles the add event of a block device (e.g., sda),
  	will never finish (will keep looping until killed by timeout),
  	because it requires an exclusive lock of that device..

  	however, that device already has a lock in place (locked by
  	udev worker before rule processing), which will only be
  	released after the rule processing (after multipath ends.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1431650/+subscriptions



More information about the Ubuntu-sponsors mailing list