[Bug 652812] [NEW] task blocked for more than 120 seconds on server kernel

Lars 652812 at bugs.launchpad.net
Fri Oct 1 08:52:17 UTC 2010


Public bug reported:

Hi,

this is about a ubuntu server version.
The server consists mainly of fast HDDs and 2 external attached LTO-3 tape drives in a changer.
It's purpose is to sync with other servers and then write ewverything onto both tape drives in parallel overnight.

The following is our main problem:
[ 1081.590063] INFO: task mbuffer1:2589 blocked for more than 120 seconds.
[ 1081.590577] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.591151] mbuffer1      D 0000000000000000     0  2589   2560 0x00000000
[ 1081.591162]  ffff88080cee9c18 0000000000000082 0000000000015bc0 0000000000015bc0
[ 1081.591173]  ffff8803f87ac890 ffff88080cee9fd8 0000000000015bc0 ffff8803f87ac4d0
[ 1081.591181]  0000000000015bc0 ffff88080cee9fd8 0000000000015bc0 ffff8803f87ac890
[ 1081.591189] Call Trace:
[ 1081.591208]  [<ffffffff815583ad>] schedule_timeout+0x22d/0x300
[ 1081.591220]  [<ffffffff812b4567>] ? kobject_put+0x27/0x60
[ 1081.591228]  [<ffffffff81559f45>] ? _spin_lock_irq+0x15/0x20
[ 1081.591238]  [<ffffffff8138a90a>] ? scsi_request_fn+0xda/0x5e0
[ 1081.591246]  [<ffffffff81557656>] wait_for_common+0xd6/0x180
[ 1081.591256]  [<ffffffff8129de33>] ? __generic_unplug_device+0x33/0x40
[ 1081.591266]  [<ffffffff8105a350>] ? default_wake_function+0x0/0x20
[ 1081.591286]  [<ffffffffa015c4d8>] ? T.945+0x158/0x170 [st]
[ 1081.591294]  [<ffffffff815577bd>] wait_for_completion+0x1d/0x20
[ 1081.591305]  [<ffffffffa015c637>] T.944+0x127/0x270 [st]
[ 1081.591315]  [<ffffffffa0162092>] st_write+0x5a2/0xc70 [st]
[ 1081.591324]  [<ffffffff8105a380>] ? wake_up_state+0x10/0x20
[ 1081.591334]  [<ffffffff81143aa8>] vfs_write+0xb8/0x1a0
[ 1081.591342]  [<ffffffff81144311>] sys_write+0x51/0x80
[ 1081.591351]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
[ 1081.591358] INFO: task mbuffer2:2608 blocked for more than 120 seconds.
[ 1081.591800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.592374] mbuffer2      D 0000000000000000     0  2608   2591 0x00000000
[ 1081.592383]  ffff8800df895c18 0000000000000082 0000000000015bc0 0000000000015bc0
[ 1081.592392]  ffff8803f87a9ab0 ffff8800df895fd8 0000000000015bc0 ffff8803f87a96f0
[ 1081.592400]  0000000000015bc0 ffff8800df895fd8 0000000000015bc0 ffff8803f87a9ab0
[ 1081.592408] Call Trace:
[ 1081.592417]  [<ffffffff815583ad>] schedule_timeout+0x22d/0x300
[ 1081.592425]  [<ffffffff812b4567>] ? kobject_put+0x27/0x60
[ 1081.592432]  [<ffffffff81559f45>] ? _spin_lock_irq+0x15/0x20
[ 1081.592439]  [<ffffffff8138a90a>] ? scsi_request_fn+0xda/0x5e0
[ 1081.592448]  [<ffffffff81557656>] wait_for_common+0xd6/0x180
[ 1081.592456]  [<ffffffff8129de33>] ? __generic_unplug_device+0x33/0x40
[ 1081.592464]  [<ffffffff8105a350>] ? default_wake_function+0x0/0x20
[ 1081.592474]  [<ffffffffa015c4d8>] ? T.945+0x158/0x170 [st]
[ 1081.592482]  [<ffffffff815577bd>] wait_for_completion+0x1d/0x20
[ 1081.592492]  [<ffffffffa015c637>] T.944+0x127/0x270 [st]
[ 1081.592502]  [<ffffffffa0162092>] st_write+0x5a2/0xc70 [st]
[ 1081.592510]  [<ffffffff8105a380>] ? wake_up_state+0x10/0x20
[ 1081.592518]  [<ffffffff81143aa8>] vfs_write+0xb8/0x1a0
[ 1081.592525]  [<ffffffff81144311>] sys_write+0x51/0x80
[ 1081.592533]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b


After the 5th 120s delay the following aborts the backup:
[ 1818.980059] mptscsih: ioc1: attempting task abort! (sc=ffff880057bb7000)
[ 1818.980067] st 6:0:4:0: CDB: Write(6): 0a 00 04 00 00 00
[ 1829.300042] mptscsih: ioc1: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!!
[ 1831.280030] mptscsih: ioc1: task abort: SUCCESS (sc=ffff880057bb7000)
[ 1831.282296] mptscsih: ioc1: attempting task abort! (sc=ffff880057bb6a00)
[ 1831.282302] st 6:0:5:0: CDB: Write(6): 0a 00 04 00 00 00
[ 1831.282321] mptscsih: ioc1: task abort: SUCCESS (sc=ffff880057bb6a00)
[ 1831.284945] st0: Error 80000 (driver bt 0x0, host bt 0x8).
[ 1831.285106] st1: Error 80000 (driver bt 0x0, host bt 0x8).
[ 1831.490044] scsi target6:0:4: Beginning Domain Validation
[ 1831.637097] scsi target6:0:4: Ending Domain Validation
[ 1831.637208] scsi target6:0:4: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 64)
[ 1834.150032] scsi target6:0:5: Beginning Domain Validation
[ 1834.297533] scsi target6:0:5: Ending Domain Validation
[ 1834.297649] scsi target6:0:5: FAST-160 WIDE SCSI 320.0 MB/s DT IU RTI PCOMP (6.25 ns, offset 64)
[ 1910.340056] scsi target6:0:5: Beginning Domain Validation
[ 1910.729074] scsi target6:0:5: Ending Domain Validation
[ 1910.729194] scsi target6:0:5: FAST-160 WIDE SCSI 320.0 MB/s DT IU RTI PCOMP (6.25 ns, offset 64)


This is with the SAS-LSI driver manually updated to version:
# cat /sys/module/mptbase/version 
4.24.00.00

because I get lost connections to SATA drives with the driver supplied
with the kernel (was with 2.6.32-23).

This is a really serious bug for this server! It prevents it from doing backups.
Please also read Bug 494476


regards
Lars

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-25-server 2.6.32-25.44 [modified: lib/modules/2.6.32-25-server/kernel/drivers/message/fusion/mptbase.ko lib/modules/2.6.32-25-server/kernel/drivers/message/fusion/mptctl.ko lib/modules/2.6.32-25-server/kernel/drivers/message/fusion/mptfc.ko lib/modules/2.6.32-25-server/kernel/drivers/message/fusion/mptlan.ko lib/modules/2.6.32-25-server/kernel/drivers/message/fusion/mptsas.ko lib/modules/2.6.32-25-server/kernel/drivers/message/fusion/mptscsih.ko lib/modules/2.6.32-25-server/kernel/drivers/message/fusion/mptspi.ko]
Regression: No
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-25.44-server 2.6.32.21+drm33.7
Uname: Linux 2.6.32-25-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg:
 
Date: Fri Oct  1 10:20:57 2010
MachineType: Supermicro H8DI3+
PciMultimedia:
 
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-25-server root=LABEL=WURZEL ro elevator=noop quiet splash
ProcEnviron:
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 12/07/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.0b
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: H8DI3+
dmi.board.vendor: Supermicro
dmi.board.version: 1234567890
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 1234567890
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.0b:bd12/07/2009:svnSupermicro:pnH8DI3+:pvr1234567890:rvnSupermicro:rnH8DI3+:rvr1234567890:cvnSupermicro:ct3:cvr1234567890:
dmi.product.name: H8DI3+
dmi.product.version: 1234567890
dmi.sys.vendor: Supermicro

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug lucid needs-upstream-testing

-- 
task blocked for more than 120 seconds on server kernel
https://bugs.launchpad.net/bugs/652812
You received this bug notification because you are a member of Kernel
Bugs, which is subscribed to linux in ubuntu.




More information about the kernel-bugs mailing list