[Bug 909563] Re: mdadm device hung, problem with kernel/EBS
David Taylor
909563 at bugs.launchpad.net
Thu Dec 29 03:27:26 UTC 2011
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/909563
Title:
mdadm device hung, problem with kernel/EBS
Status in “mdadm” package in Ubuntu:
New
Bug description:
I'm using Ubuntu 10.10, ami-af7e2eea in us-west-1 on c1.xlarge.
I have 8 x 128GB EBS volumes in a RAID10 array using mdadm.
After a while /dev/md0 freezes and load shoots up from <5 to >300-400.
Any attempts to interrogate the mounted filesystem hang and are
uninterruptible.
I ran "mdadm --examine" on each of the devices. All except one
returned "state: clean". On one device that command never returned
and I had to Ctrl-C to interrupt it.
In /var/log/syslog:
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230054] INFO: task md0_raid10:625 blocked for more than 120 seconds.
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230082] md0_raid10 D ffff880003f579c0 0 625 2 0x00000000
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230089] ffff8801b7ea1ca0 0000000000000246 0000000000000000 00000000000159c0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230097] ffff8801b7ea1fd8 00000000000159c0 ffff8801b7ea1fd8 ffff8801b61616e0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230105] 00000000000159c0 00000000000159c0 ffff8801b7ea1fd8 00000000000159c0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230113] Call Trace:
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230126] [<ffffffff814643e1>] md_super_wait+0xd1/0xf0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230133] [<ffffffff8107fa10>] ? autoremove_wake_function+0x0/0x40
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230138] [<ffffffff814649b8>] md_update_sb+0x268/0x3e0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230144] [<ffffffff815a6dce>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230155] [<ffffffff8146a1a2>] md_check_recovery+0x212/0x540
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230163] [<ffffffffa0061fbf>] raid10d+0x3f/0x400 [raid10]
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230170] [<ffffffff810072df>] ? xen_restore_fl_direct_end+0x0/0x1
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230173] [<ffffffff815a6dce>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230177] [<ffffffff81464109>] md_thread+0x119/0x150
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230181] [<ffffffff8107fa10>] ? autoremove_wake_function+0x0/0x40
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230185] [<ffffffff81463ff0>] ? md_thread+0x0/0x150
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230189] [<ffffffff8107f4b6>] kthread+0x96/0xa0
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230194] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230199] [<ffffffff8100a313>] ? int_ret_from_sys_call+0x7/0x1b
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230203] [<ffffffff815a735d>] ? retint_restore_args+0x5/0x6
Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230207] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10
Is this a problem with the AMI? The AKI? Any suggestions?
Thanks.
Cheers,
David.
ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-31-virtual 2.6.35-31.63
Regression: Yes
Reproducible: Yes
ProcVersionSignature: User Name 2.6.35-31.63-virtual 2.6.35.13
Uname: Linux 2.6.35-31-virtual x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
Date: Thu Dec 29 03:20:28 2011
Ec2AMI: ami-af7e2eea
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-1a
Ec2InstanceType: c1.xlarge
Ec2Kernel: aki-9ba0f1de
Ec2Ramdisk: unavailable
Lspci:
Lsusb: Error: command ['lsusb'] failed with exit code 1:
ProcCmdLine: root=LABEL=uec-rootfs ro console=hvc0
ProcEnviron:
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/909563/+subscriptions
More information about the foundations-bugs
mailing list