[Bug 1032550] Re: [multipath] failed to get sysfs information
Peter Petrakis
peter.petrakis at canonical.com
Fri Dec 21 16:14:18 UTC 2012
@Ronald
First, please attach http://www.rmoesbergen.nl/vmcore-crash.tgz to the bug, launchpad
can handle it just fine. Also, this is going to take awhile. We're off all next week so don't
expect any movement on this until early-mid Jan. Feel free to ping me if I forget.
Also, at what time did your testing start? I'm seeing this everywhere almost immediately
emc: ALUA failover mode detected
Could you also illustrate what the steady state target distribution
should be?
I see targets like this:
sd 3:0:0:0: [sdb] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB)
in the minority compared to
sd 3:0:0:1: [sdc] 419430400 512-byte logical blocks: (214 GB/200 GiB)
Wondering if your SAN is misreporting READ CAPACITY.
The dump looks good. Immediately I can tell you that all the scsi hosts
are still RUNNING and not in error handling. It looks like I'll have examine
the scsi target states and the dm tables.
So there are these stuck processes
crash> ps | grep UN
1530 2 0 ffff880415ef9700 UN 0.0 0 0 [jbd2/dm-1-8]
2180 2 1 ffff88040613ae00 UN 0.0 0 0 [flush-252:1]
4739 1 2 ffff880418e70000 UN 5.8 16426520 1029488 mysqld
Which adds up, you can't write back.
This also looks really suspicious.
[62856.457650] end_request: I/O error, dev sdf, sector 21272960
[62856.457966] device-mapper: multipath: Failing path 8:80.
[62856.462495] scsi 3:0:0:0: emc: Detached
[62856.462730] device-mapper: multipath: Failing path 8:80.
[62856.462798] sd 4:0:0:0: emc: ALUA failover mode detected
[62856.462806] sd 4:0:0:0: emc: at SP A Port 0 (owned, default SP A)
# sketchy
[62856.462814] device-mapper: multipath: Could not failover the device: Handler scsi_dh_emc Error 15.
# it looks like it's retrying
[63122.241178] sd 3:0:1:0: [sdf] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[63122.241185] sd 3:0:1:0: [sdf] CDB: Write(10): 2a 00 01 44 b4 d8 00 00 20 00
[63122.241198] end_request: I/O error, dev sdf, sector 21279960
[63122.241513] device-mapper: multipath: Failing path 8:80.
[63122.244865] scsi 3:0:0:0: emc: Detached
[63122.245045] sd 4:0:0:0: emc: ALUA failover mode detected
[63122.245053] sd 4:0:0:0: emc: at SP A Port 0 (owned, default SP A)
# sketchy
[63122.245062] device-mapper: multipath: Could not failover the device: Handler scsi_dh_emc Error 15.
...
which comes from: [drivers/md/dm-mpath.c]
case SCSI_DH_NOSYS:
if (!m->hw_handler_name) {
errors = 0;
break;
}
DMERR("Could not failover the device: Handler scsi_dh_%s "
"Error %d.", m->hw_handler_name, errors);
/*
* Fail path for now, so we do not ping pong
*/
fail_path(pgpath);
break;
Hey, was this intentional?
[ 0.018792] Hardware name: ProLiant DL380p Gen8
[ 0.018794] Your BIOS is broken and requested that x2apic be disabled
[ 0.018795] This will leave your machine vulnerable to irq-injection attacks
[ 0.018796] Use 'intremap=no_x2apic_optout' to override BIOS request
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1032550
Title:
[multipath] failed to get sysfs information
Status in “multipath-tools” package in Ubuntu:
In Progress
Bug description:
when shutdown switch port of host HBA, multippath-tool can't get
correct information of subpath. by check the "multipath" output,
some storage device type info disapppear and the failed path always
stay in path group and don't be clear out.
mpath2 (3600601601c102900944737e4a73fe011) dm-51 ,
size=6.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- #:#:#:# - #:# failed faulty running
| `- 5:0:2:5 sdcu 70:32 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 5:0:3:5 sdfa 129:192 active ready running
`- #:#:#:# - #:# failed faulty running
mpath38 (3600601601c1029008eb6dbe8ae3fe011) dm-59 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 5:0:2:13 sddf 70:208 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 5:0:3:13 sdfk 130:96 active ready running
mpath63 (360000970000198700131533030303932) dm-13 EMC,SYMMETRIX
size=5.6G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 5:0:0:8 sdl 8:176 active ready running
`- 5:0:1:8 sdbd 67:112 active ready running
mpath95 (360000970000198700131533030323445) dm-43 ,
size=898M features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- #:#:#:# - #:# failed faulty running
|- #:#:#:# - #:# failed faulty running
|- 5:0:0:38 sdas 66:192 active ready running
`- 5:0:1:38 sdck 69:128 active ready running
Same time, the syslog show many
---------------
Aug 2 18:25:16 Linux51 multipathd: sdht: failed to get sysfs information
Aug 2 18:25:16 Linux51 multipathd: sdht: unusable path
... ...
---------------
After path recover, all failed path come back without problem. there
is no IP blocked and error happend during fail/recover period.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions
More information about the foundations-bugs
mailing list