[Bug 1020436] Re: Cannot read superblock after FC multipath failover

Peter Petrakis peter.petrakis at canonical.com
Mon Jul 23 14:40:04 UTC 2012


Hmm, we've got some counter indicators here.

lvs claims that the volumes are active. But the probe itself
is showing problems reading the volumes.

XFS is telling us that it cannot write it's journal to disk.
[1039812.311433] Filesystem "dm-4": Log I/O Error Detected. Shutting down filesystem: dm-4

"fs/xfs/xfs_rw.c"

 90 /*
 91  * Force a shutdown of the filesystem instantly while keeping
 92  * the filesystem consistent. We don't do an unmount here; just shutdown
 93  * the shop, make sure that absolutely nothing persistent happens to
 94  * this filesystem after this point.
 95  */
 96 void
 97 xfs_do_force_shutdown(
...
127         if (flags & SHUTDOWN_CORRUPT_INCORE) {
128                 xfs_cmn_err(XFS_PTAG_SHUTDOWN_CORRUPT, CE_ALERT, mp,
129     "Corruption of in-memory data detected.  Shutting down filesystem: %s",
130                         mp->m_fsname);
131                 if (XFS_ERRLEVEL_HIGH <= xfs_error_level) {
132                         xfs_stack_trace();
133                 }
134         } else if (!(flags & SHUTDOWN_FORCE_UMOUNT)) {
135                 if (logerror) {
136                         xfs_cmn_err(XFS_PTAG_SHUTDOWN_LOGERROR, CE_ALERT, mp,
137                 "Log I/O Error Detected.  Shutting down filesystem: %s",
138                                 mp->m_fsname);

The logs conveniently tell us where it was called from too.
1009 void
1010 xlog_iodone(xfs_buf_t *bp)
1011 {

...

1036         /*
1037          * Race to shutdown the filesystem if we see an error.
1038          */
1039         if (XFS_TEST_ERROR((XFS_BUF_GETERROR(bp)), l->l_mp,
1040                         XFS_ERRTAG_IODONE_IOERR, XFS_RANDOM_IODONE_IOERR)) {
1041                 xfs_ioerror_alert("xlog_iodone", l->l_mp, bp, XFS_BUF_ADDR(bp));
1042                 XFS_BUF_STALE(bp);
1043                 xfs_force_shutdown(l->l_mp, SHUTDOWN_LOG_IO_ERROR);
1044                 /*
1045                  * This flag will be propagated to the trans-committed
1046                  * callback routines to let them know that the log-commit
1047                  * didn't succeed.
1048                  */
1049                 aborted = XFS_LI_ABORTED;

I assume dm-4 is the LV that XFS is mounted on, did you run the dd test
on that?

I'm starting to wonder if the LVM device filter is lying to us, after failover, something
changes which misrepresents the LV and then XFS bails out.

If you can perform that DD for every PV that backs dm-4 successfully then there's
something wrong with the DM map for those LVs after failover occurs.

OK, what I need from you now is a before and after  (same fault injection method) of:
0) ls -lR /dev/ > dev_major_minor.log
1) lvs -o lv_attr
2) pvdisplay -vvv
3) lvdisplay -vvv
4) dmsetup table -v
5) "dd test" on all block devices: lv, mp, sd
6) dmesg output

Please attach this as a single tarball, that has a timestamp in the filename
and has a directory structure of:

foo.tgz
  before/
  after/

If this all checks out, then what's probably happening is the when
multipath begins the failover process, there's enough of a delay that
XFS simply bails out early before IO is ready to be sent down the
remaining paths. group by priority may perform better here and is
something you can test.

I looked at the XFS mount arguments and didn't find anything that would
make it more lenient in these situations. 

If you can manage it, a LUN formatted with ext3 under these circumstances
would help in ruling out whether the filesystem is part of the problem.

Thanks.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

Status in “multipath-tools” package in Ubuntu:
  Incomplete

Bug description:
  This might not be multipath-tools, but it's the likeliest candidate.

  If fibre channel connectivity is disrupted on an active path, I can
  see the multipath daemon rejuggling active paths as expected.  However
  instead of utilising a new path, the server continues trying active
  i/o down the same path until it fails, and the superblock on the lun
  is marked unreadable.

  The luns simply cannot be remounted even after the paths are
  recovered.  However, after a simple reboot, the luns are again
  readable _regardless_ of the visible path.  (Eg I shut the active FC
  port then rebooted, and the storage was mountable per normal over a
  different path).

  I've replicated this behaviour using both xfs and ext4 filesystems, on
  multiple different luns presented to the server.

  The luns are visible over four FC paths, and the /dev/mapper/mpathX
  identity is used as the physical volume for an LVM2 volume group, into
  which the logical volume(s) and filesystem are created.

  root@<host>:~# lsb_release -rd
  Description:	Ubuntu 10.04.3 LTS
  Release:	10.04

  As you can see here, the /dev/test and /dev/test2 volume groups are
  both active, yet showing i/o errors after a pathing fail.

  root@<host>:~# lvscan
    /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
    /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
    ACTIVE            '/dev/test2/lvol0' [4.00 GiB] inherit
    ACTIVE            '/dev/test/lvol0' [4.00 GiB] inherit
    ACTIVE            '/dev/mvsan/san' [500.00 GiB] inherit

  root@<host>:~# pvscan
    /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output error
    /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output error
    /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
    /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
    /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
    /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output error
    /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output error
    /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
    /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
    /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
    PV /dev/mapper/mpath2   VG test2            lvm2 [4.00 GiB / 0    free]
    PV /dev/mapper/mpath1   VG test             lvm2 [4.00 GiB / 0    free]
    PV /dev/mapper/mpath0   VG mvsan            lvm2 [500.00 GiB / 0    free]

  root@<host>:~# multipath -ll
  mpath2 (360060e80058a070000008a0700000023) dm-7 HITACHI ,OPEN-V
  [size=4.0G][features=1 queue_if_no_path][hwhandler=0]
  \_ round-robin 0 [prio=4][active]
   \_ 6:0:0:2 sdd 8:48  [active][ready]
   \_ 6:0:1:2 sdg 8:96  [active][ready]
   \_ 7:0:0:2 sdj 8:144 [active][ready]
   \_ 7:0:1:2 sdp 8:240 [active][ready]
  mpath1 (360060e80058a070000008a070000000e) dm-6 HITACHI ,OPEN-V
  [size=4.0G][features=1 queue_if_no_path][hwhandler=0]
  \_ round-robin 0 [prio=4][active]
   \_ 6:0:0:1 sdc 8:32  [active][ready]
   \_ 6:0:1:1 sdf 8:80  [active][ready]
   \_ 7:0:0:1 sdi 8:128 [active][ready]
   \_ 7:0:1:1 sdo 8:224 [active][ready]
  mpath0 (360060e80058a070000008a0700002417) dm-5 HITACHI ,OPEN-V
  [size=500G][features=1 queue_if_no_path][hwhandler=0]
  \_ round-robin 0 [prio=4][active]
   \_ 6:0:0:0 sdb 8:16  [active][ready]
   \_ 6:0:1:0 sde 8:64  [active][ready]
   \_ 7:0:0:0 sdh 8:112 [active][ready]
   \_ 7:0:1:0 sdk 8:160 [active][ready]

  The dmesg output is very instructive, but still doesn't help with the
  root cause per se.  I set the scsi-timeouts nice and high (90 seconds)
  to accommodate any potential delays in path failover.  The path checks
  (test unit ready/TUR) occur about every 20 seconds.

  See below:

  [ 2192.891962] qla2xxx 0000:24:00.0: LOOP DOWN detected (2 3 0).
  [ 2222.820850]  rport-7:0-0: blocked FC remote port time out: removing target and saving binding
  [ 2222.821297]  rport-7:0-1: blocked FC remote port time out: removing target and saving binding
  [ 2222.827286] device-mapper: multipath: Failing path 8:112.
  [ 2222.827369] device-mapper: multipath: Failing path 8:160.

  So this is all as expected.

  Then:

  [ 2222.924665] sd 7:0:0:0: [sdh] Synchronizing SCSI cache
  [ 2222.924696] sd 7:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2222.926071] device-mapper: multipath: Failing path 8:160.
  [ 2222.976444] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2222.976446] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2222.976449] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 00 00 00 01 00
  [ 2222.976455] end_request: I/O error, dev sdm, sector 8380672
  [ 2222.980258] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2222.980260] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2222.980262] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 70 00 00 01 00
  [ 2222.980267] end_request: I/O error, dev sdm, sector 8380784
  [ 2222.984139] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2222.984141] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2222.984142] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 01 00
  [ 2222.984147] end_request: I/O error, dev sdm, sector 384
  [ 2222.988137] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2222.988139] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2222.988141] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 88 00 00 01 00
  [ 2222.988145] end_request: I/O error, dev sdm, sector 392
  [ 2222.992274] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2222.992276] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2222.992278] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
  [ 2222.992282] end_request: I/O error, dev sdm, sector 384
  [ 2222.996583] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2222.996585] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2222.996587] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f e1 00 00 00 08 00
  [ 2222.996592] end_request: I/O error, dev sdl, sector 8380672
  [ 2223.000980] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.000983] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.000985] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f e1 70 00 00 08 00
  [ 2223.000992] end_request: I/O error, dev sdl, sector 8380784
  [ 2223.005571] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.005573] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.005576] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 80 00 00 08 00
  [ 2223.005583] end_request: I/O error, dev sdl, sector 384
  [ 2223.010258] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.010260] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.010263] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 88 00 00 08 00
  [ 2223.010269] end_request: I/O error, dev sdl, sector 392
  [ 2223.015062] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.015065] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.015068] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 80 00 00 08 00
  [ 2223.015072] end_request: I/O error, dev sdl, sector 384
  [ 2223.046324] sd 7:0:0:2: [sdj] Unhandled error code
  [ 2223.046325] sd 7:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.046328] sd 7:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
  [ 2223.046333] end_request: I/O error, dev sdj, sector 8388480
  [ 2223.051339] sd 7:0:0:2: [sdj] Unhandled error code
  [ 2223.051341] sd 7:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.051343] sd 7:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
  [ 2223.051347] end_request: I/O error, dev sdj, sector 8388592
  [ 2223.051841] sd 7:0:0:1: [sdi] Synchronizing SCSI cache
  [ 2223.051868] sd 7:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.056574] sd 7:0:1:0: [sdk] Unhandled error code
  [ 2223.056576] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.056578] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 3e 7f ff 80 00 00 08 00
  [ 2223.056583] end_request: I/O error, dev sdk, sector 1048575872
  [ 2223.061961] sd 7:0:1:0: [sdk] Unhandled error code
  [ 2223.061962] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.061964] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 3e 7f ff f0 00 00 08 00
  [ 2223.061969] end_request: I/O error, dev sdk, sector 1048575984
  [ 2223.067450] sd 7:0:1:0: [sdk] Unhandled error code
  [ 2223.067453] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.067456] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.067462] end_request: I/O error, dev sdk, sector 0
  [ 2223.073081] sd 7:0:1:0: [sdk] Unhandled error code
  [ 2223.073083] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.073085] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
  [ 2223.073089] end_request: I/O error, dev sdk, sector 8
  [ 2223.078824] sd 7:0:1:0: [sdk] Unhandled error code
  [ 2223.078825] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.078827] sd 7:0:1:0: [sdk] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.078832] end_request: I/O error, dev sdk, sector 0
  [ 2223.084675] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.084677] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.084679] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
  [ 2223.084683] end_request: I/O error, dev sdl, sector 8388480
  [ 2223.090671] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.090672] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.090674] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
  [ 2223.090679] end_request: I/O error, dev sdl, sector 8388592
  [ 2223.096830] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.096832] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.096834] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.096838] end_request: I/O error, dev sdl, sector 0
  [ 2223.103075] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.103077] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.103079] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
  [ 2223.103083] end_request: I/O error, dev sdl, sector 8
  [ 2223.109441] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.109444] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.109446] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.109453] end_request: I/O error, dev sdl, sector 0
  [ 2223.115922] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.115924] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.115926] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
  [ 2223.115930] end_request: I/O error, dev sdm, sector 8388480
  [ 2223.122543] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.122545] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.122547] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
  [ 2223.122551] end_request: I/O error, dev sdm, sector 8388592
  [ 2223.128957] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.128959] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.128961] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.128965] end_request: I/O error, dev sdm, sector 0
  [ 2223.135176] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.135178] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.135180] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
  [ 2223.135185] end_request: I/O error, dev sdm, sector 8
  [ 2223.140962] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.140964] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.140965] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.140970] end_request: I/O error, dev sdm, sector 0
  [ 2223.164023] sd 7:0:0:2: [sdj] Synchronizing SCSI cache
  [ 2223.164050] sd 7:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.178664] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.178666] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.178668] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
  [ 2223.178673] end_request: I/O error, dev sdm, sector 384
  [ 2223.183803] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.183805] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.183807] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 01 80 00 00 08 00
  [ 2223.183812] end_request: I/O error, dev sdl, sector 384
  [ 2223.191271] sd 7:0:1:1: [sdl] Unhandled error code
  [ 2223.191273] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.191275] sd 7:0:1:1: [sdl] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.191280] end_request: I/O error, dev sdl, sector 0
  [ 2223.195433] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.195435] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.195438] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.195444] end_request: I/O error, dev sdm, sector 0
  [ 2223.236558] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.236560] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.236563] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 00 00 00 01 00
  [ 2223.236568] end_request: I/O error, dev sdm, sector 8380672
  [ 2223.240754] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.240756] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.240758] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 70 00 00 01 00
  [ 2223.240763] end_request: I/O error, dev sdm, sector 8380784
  [ 2223.241440] sd 7:0:1:0: [sdk] Synchronizing SCSI cache
  [ 2223.241531] sd 7:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.244982] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.244984] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.244986] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 01 00
  [ 2223.244991] end_request: I/O error, dev sdm, sector 384
  [ 2223.249170] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.249173] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.249176] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 88 00 00 01 00
  [ 2223.249183] end_request: I/O error, dev sdm, sector 392
  [ 2223.253313] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.253315] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.253318] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
  [ 2223.253325] end_request: I/O error, dev sdm, sector 384
  [ 2223.270352] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.270355] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.270359] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
  [ 2223.270366] end_request: I/O error, dev sdm, sector 8388480
  [ 2223.274548] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.274550] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.274552] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
  [ 2223.274557] end_request: I/O error, dev sdm, sector 8388592
  [ 2223.278653] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.278654] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.278656] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.278661] end_request: I/O error, dev sdm, sector 0
  [ 2223.282759] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.282760] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.282762] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
  [ 2223.282767] end_request: I/O error, dev sdm, sector 8
  [ 2223.286863] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.286865] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.286867] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
  [ 2223.286872] end_request: I/O error, dev sdm, sector 0
  [ 2223.321219] sd 7:0:1:2: [sdm] Unhandled error code
  [ 2223.321221] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.321223] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 00 01 80 00 00 04 00
  [ 2223.321229] end_request: I/O error, dev sdm, sector 384
  [ 2223.321231] sd 7:0:1:1: [sdl] Synchronizing SCSI cache
  [ 2223.321261] sd 7:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [ 2223.443319] sd 7:0:1:2: [sdm] Synchronizing SCSI cache
  [ 2223.443351] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

  And then:

  [ 2241.094148] Aborting journal on device dm-3-8.
  [ 2241.098327] __ratelimit: 24 callbacks suppressed
  [ 2241.098330] Buffer I/O error on device dm-3, logical block 458752
  [ 2241.102730] lost page write due to I/O error on dm-3
  [ 2241.102743] JBD2: I/O error detected when updating journal superblock for dm-3-8.
  [ 2265.843946] EXT4-fs error (device dm-3): ext4_journal_start_sb: Detected aborted journal
  [ 2265.855469] EXT4-fs (dm-3): Remounting filesystem read-only
  [ 2280.676139] I/O error in filesystem ("dm-2") meta-data dev dm-2 block 0x18       ("xfs_trans_read_buf") error 5 buf count 4096

  This is very disturbing as it means that the multipathing is not in
  fact working at all - You can see that the I/O errors are pointing at
  'dm-3' rather than the mpath associated device, but thats normal as
  the logical volume appears in that place.

  The filesystems are healthy, when you reboot the server and check them
  there are zero errors.  There's something wrong with the way that the
  pathing failover is being handled, but I'm not sure where I'd find the
  ultimate root cause of this.

  This segfault occurs after the above issues have already manifested:

  [ 4057.146508] multipathd[3261]: segfault at a ip 00007f89bdce798a sp
  00007fffd25c2780 error 4 in libc-2.11.1.so[7f89bdc70000+17a000].

  root@<host>:~# apt-cache policy multipath-tools
  multipath-tools:
    Installed: 0.4.8-14ubuntu4.10.04.2
    Candidate: 0.4.8-14ubuntu4.10.04.2
    Version table:
   *** 0.4.8-14ubuntu4.10.04.2 0
          500 http://au.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
          100 /var/lib/dpkg/status
       0.4.8-14ubuntu4 0
          500 http://au.archive.ubuntu.com/ubuntu/ lucid/main Packages

  Let me know if you need further details.

  ProblemType: Bug
  DistroRelease: Ubuntu 10.04
  Package: multipath-tools 0.4.8-14ubuntu4.10.04.2
  ProcVersionSignature: Ubuntu 2.6.32-41.91-server 2.6.32.59+drm33.24
  Uname: Linux 2.6.32-41-server x86_64
  Architecture: amd64
  Date: Tue Jul  3 15:13:23 2012
  InstallationMedia: Ubuntu-Server 10.04.3 LTS "Lucid Lynx" - Release amd64 (20110719.2)
  ProcEnviron:
   LANG=en_AU.UTF8
   SHELL=/bin/bash
  SourcePackage: multipath-tools

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions




More information about the foundations-bugs mailing list