[Bug 909563] Re: mdadm device hung, problem with kernel/EBS

David Taylor 909563 at bugs.launchpad.net
Thu Dec 29 03:27:26 UTC 2011


-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/909563

Title:
  mdadm device hung, problem with kernel/EBS

Status in “mdadm” package in Ubuntu:
  New

Bug description:
  I'm using Ubuntu 10.10, ami-af7e2eea in us-west-1 on c1.xlarge.

  I have 8 x 128GB EBS volumes in a RAID10 array using mdadm.

  After a while /dev/md0 freezes and load shoots up from <5 to >300-400.

  Any attempts to interrogate the mounted filesystem hang and are
  uninterruptible.

  I ran "mdadm --examine" on each of the devices.  All except one
  returned "state: clean".  On one device that command never returned
  and I had to Ctrl-C to interrupt it.

  In /var/log/syslog:

  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230054] INFO: task md0_raid10:625 blocked for more than 120 seconds.
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230082] md0_raid10    D ffff880003f579c0     0   625      2 0x00000000
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230089]  ffff8801b7ea1ca0 0000000000000246 0000000000000000 00000000000159c0
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230097]  ffff8801b7ea1fd8 00000000000159c0 ffff8801b7ea1fd8 ffff8801b61616e0
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230105]  00000000000159c0 00000000000159c0 ffff8801b7ea1fd8 00000000000159c0
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230113] Call Trace:
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230126]  [<ffffffff814643e1>] md_super_wait+0xd1/0xf0
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230133]  [<ffffffff8107fa10>] ? autoremove_wake_function+0x0/0x40
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230138]  [<ffffffff814649b8>] md_update_sb+0x268/0x3e0
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230144]  [<ffffffff815a6dce>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230155]  [<ffffffff8146a1a2>] md_check_recovery+0x212/0x540
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230163]  [<ffffffffa0061fbf>] raid10d+0x3f/0x400 [raid10]
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230170]  [<ffffffff810072df>] ? xen_restore_fl_direct_end+0x0/0x1
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230173]  [<ffffffff815a6dce>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230177]  [<ffffffff81464109>] md_thread+0x119/0x150
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230181]  [<ffffffff8107fa10>] ? autoremove_wake_function+0x0/0x40
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230185]  [<ffffffff81463ff0>] ? md_thread+0x0/0x150
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230189]  [<ffffffff8107f4b6>] kthread+0x96/0xa0
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230194]  [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230199]  [<ffffffff8100a313>] ? int_ret_from_sys_call+0x7/0x1b
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230203]  [<ffffffff815a735d>] ? retint_restore_args+0x5/0x6
  Dec 24 07:24:57 ip-10-162-9-13 kernel: [183240.230207]  [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10

  Is this a problem with the AMI?  The AKI?  Any suggestions?

  Thanks.

  Cheers,
  David.

  ProblemType: Bug
  DistroRelease: Ubuntu 10.10
  Package: linux-image-2.6.35-31-virtual 2.6.35-31.63
  Regression: Yes
  Reproducible: Yes
  ProcVersionSignature: User Name 2.6.35-31.63-virtual 2.6.35.13
  Uname: Linux 2.6.35-31-virtual x86_64
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  Date: Thu Dec 29 03:20:28 2011
  Ec2AMI: ami-af7e2eea
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-west-1a
  Ec2InstanceType: c1.xlarge
  Ec2Kernel: aki-9ba0f1de
  Ec2Ramdisk: unavailable
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  ProcCmdLine: root=LABEL=uec-rootfs ro console=hvc0
  ProcEnviron:
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/909563/+subscriptions




More information about the foundations-bugs mailing list