[Bug 1017715] Re: Serialization problem of udev events with DM_COOKIE set
Launchpad Bug Tracker
1017715 at bugs.launchpad.net
Mon Jul 16 05:23:36 UTC 2012
** Branch linked: lp:~ubuntu-core-dev/ubuntu/oneiric/udev/ubuntu
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to udev in Ubuntu.
https://bugs.launchpad.net/bugs/1017715
Title:
Serialization problem of udev events with DM_COOKIE set
Status in “udev” package in Ubuntu:
Fix Released
Status in “udev” source package in Oneiric:
Fix Committed
Status in “udev” source package in Precise:
Fix Committed
Bug description:
Alex Lyakas reports:
Hi Herton,
we are doing initial testing of our software on stock ubuntu-precise
3.2.0-23 - 36. It looks like we have identified a problem with the udevd
code. Here are some details:
We believe the problem was introduced by a fix to
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/802626. The fix
introduced code like this:
in event_queue_insert():
/* run all events with a timeout set immediately, or in the case
* it's a dm_cookie event being processed */
if (udev_device_get_timeout(dev) > 0 ||
udev_device_get_dm_cookie_set(dev)) {
event_run(event, true);
return 0;
}
Basically, what this fix does, if the udev event has a DM_COOKIE, then
this event is dispatched immediately. Previously, this event was going
through the event_queue_start(), which was checking that there are no
in-flight events for the same device (or parent, or child etc.) by
calling is_devpath_busy(). With the fix, this doesn’t happen and event
is dispatched immediately.
What we see that when a new device-mapper device is created in the
system, an “add” event is fired, then “change” event comes in (that has
a DM_COOKIE). As a result of the fix, the “change” processing is not
delayed until the “add” processing completes. As a result, both events
are handled concurrently by two different udev-workers. This causes
several different unwanted effects that we observed.
One such effect is in the attached log: dm-16 device is being created.
“add” and “change” events are fired:
May 24 20:26:37 vc-00-00-A-dev udevd[5859]: seq 100141 queued, 'add'
'block'
May 24 20:26:37 vc-00-00-A-dev udevd[5859]: seq 100142 queued, 'change'
'block'
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: seq 100142 running
The run in parallel and “change” processing starts first. Then “add”
processing starts:
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: seq 100141 running
“change” processing creates symbolic links
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating link '/dev/disk
/by-id/dm-name-vpart-5555' to '/dev/dm-16'
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating symlink '/dev/disk
/by-id/dm-name-vpart-5555' to '../../dm-16'
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating link
'/dev/mapper/vpart-5555' to '/dev/dm-16'
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating symlink
'/dev/mapper/vpart-5555' to '../dm-16'
but then “add” processing removes them:
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: update old name, '/dev/disk
/by-id/dm-name-vpart-5555' no longer belonging to
'/devices/virtual/block/dm-16'
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: no reference left, remove
'/dev/disk/by-id/dm-name-vpart-5555'
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: update old name,
'/dev/mapper/vpart-5555' no longer belonging to
'/devices/virtual/block/dm-16'
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: no reference left, remove
'/dev/mapper/vpart-5555'
As a result, there is no /dev/mapper/XXX symbolic link.
We have also seen some other bad effects of this parallel processing
like:
# bad symbolic link is created in /dev/disk/by-path (this happens in
“add” processing, when there is no DM_NAME property):
ll /dev/disk/by-path:
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name- -> ../../dm-4
lrwxrwxrwx 1 root root 10 May 23 21:45 dm-name-ioerror -> ../../dm-0
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-100 -> ../../dm-5
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-103 -> ../../dm-3
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-105 -> ../../dm-2
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-108 -> ../../dm-1
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-109 -> ../../dm-6
# devnode is created in /dev/mapper, instead of symbolic link:
root at vc-00-00-A-dev:~# ll /dev/mapper/
total 0
drwxr-xr-x 2 root root 140 May 23 22:01 ./
drwxr-xr-x 14 root root 9900 May 23 22:01 ../
lrwxrwxrwx 1 root root 7 May 23 22:01 blabla -> ../dm-1
crw------- 1 root root 10, 236 May 23 19:09 control
lrwxrwxrwx 1 root root 7 May 23 21:53 ioerror -> ../dm-0
lrwxrwxrwx 1 root root 7 May 23 22:01 ppart-4927362 -> ../dm-2
brw-rw---- 1 root disk 252, 3 May 23 22:01 ppart-4927363 // This is a
devnode, not symlink!
and some others.
We see this issue only when doing stress testing of the udev system,
like creating and deleting devices, and then creating them again and
deleting again etc.
To resolve the issue, we disabled the code that handles the DM_COOKIE in
udevd. Then we stopped seeing issues.
All in all, our understanding is that two udev events for the same
devpath should not be executing concurrently, which this fix violates
for Device-Mapper devices. Also, this problem cannot be easily addressed
by playing with udev rules, because the flaw is in the udevd code
itself.
Can you pls share your view on the problems we see?
Thanks,
Alex.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/udev/+bug/1017715/+subscriptions
More information about the foundations-bugs
mailing list