[Bug 247148] [NEW] Xen dom0 kernel corrupts software raid (2.6.24-19)

Thu Nov 13 12:48:36 UTC 2008

You have been subscribed to a public bug:

Binary package hint: linux-xen

Hard heron (fully patched) installed with kickstart file:
http://shell.cse.ucdavis.edu/~sbeards/xen.cfg

I build 4 directories on 4 raids:
mdadm -C /dev/md0 -l 5 -n 4 /dev/sd[abcd]3
mdadm -C /dev/md1 -l 5 -n 4 /dev/sd[efgh]3
mdadm -C /dev/md2 -l 5 -n 4 /dev/sd[ijkl]3
mdadm -C /dev/md3 -l 5 -n 4 /dev/sd[mnop]3

for i in `seq 0 3`; do
    mkfs.ext3 /dev/md$i;
    mount /dev/md$i /disk/$i;
    touch /disk/$i/f;
done

Then run:
 iozone -s 16g -r 1024 -t 4 -F /disk/[0123]/f

When I run iozone the disk system gets corrupted:

dmesg reports:
[ 2435.753683] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.753745] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.753811] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.753864] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.753913] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.754003] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.754068] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.754121] PCI-DMA: Out of SW-IOMMU space for 32768 bytes at device 0000:06:00.0
[ 2435.754170] 3w-9xxx: scsi8: ERROR: (0x06:0x001C): Failed to map scatter gather list.
[ 2435.754176] sd 8:0:3:0: [sdd] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
[ 2435.754181] end_request: I/O error, dev sdd, sector 37085198
[ 2435.754191] sd 8:0:3:0: [sdd] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
[ 2435.754194] end_request: I/O error, dev sdd, sector 37085582
[ 2435.854168] raid5:md0: read error corrected (8 sectors at 1902976 on sdd3)
[ 2435.854544] raid5:md0: read error corrected (8 sectors at 1902984 on sdd3)
[ 2435.854552] raid5:md0: read error corrected (8 sectors at 1902992 on sdd3)
[ 2435.854556] raid5:md0: read error corrected (8 sectors at 1903000 on sdd3)
[ 2435.854559] raid5:md0: read error corrected (8 sectors at 1903008 on sdd3)
[ 2435.854566] raid5:md0: read error corrected (8 sectors at 1903016 on sdd3)
...

/proc/mdstat reports:
md0 : active raid5 sda3[0] sdd3[4](F) sdc3[5](F) sdb3[6](F)
      187526400 blocks level 5, 64k chunk, algorithm 2 [4/1] [U___] 
md1 : active raid5 sde3[0] sdh3[3] sdg3[2] sdf3[1]
      187526400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
md2 : active raid5 sdi3[4](F) sdl3[5](F) sdk3[2] sdj3[1]
      187526400 blocks level 5, 64k chunk, algorithm 2 [4/2] [_UU_]
md3 : active raid5 sdm3[0] sdp3[3] sdo3[2] sdn3[1]
      187526400 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

When I replace the xen dom0 kernel:
Linux 43-246-120-128 2.6.24-19-xen #1 SMP Wed Jun 18 16:08:38 UTC 2008 x86_64 GNU/Linux

With:
Linux 43-246-120-128 2.6.24-19-generic #1 SMP Wed Jun 18 14:15:37 UTC 2008 x86_64 GNU/Linux

Everything works.  I've reproduced this several times and every time the
xen kernel causes multiple disks to drop out of raid, and the generic
kernel works perfectly (no drops, no dmesg, no errors).

** Affects: linux-meta (Ubuntu)
     Importance: Undecided
     Assignee: Ubuntu Kernel Team (ubuntu-kernel-team)
         Status: Fix Committed

-- 
Xen dom0 kernel corrupts software raid (2.6.24-19)
https://bugs.launchpad.net/bugs/247148
You received this bug notification because you are a member of Kernel Bugs, which is subscribed to linux-meta in ubuntu.