[Bug 606523] Re: INFO: task nfsd:5047 blocked for more than 120 seconds.

Mon Aug 9 12:02:10 UTC 2010

I believe we're seeing similar problems with our setup. We're having a
24-disk RAID10 setup on our box, with a 22TB-sized XFS filesystem
exported over NFS(v3) to our VMWare Cluster. During initial load-testing
by means of iometer and dd, we triggered strange behaviour on behalf of
nfsd, which made common operations (such as readdir()) on the mounted
export excruciatingly slow (we're talking more than an hour for a simple
`ls` to complete from within an empty directory). Changing from the
stock Lucid 2.6.32-kernel to later releases made things (seemingly) go
away during load testing, but it popped up again later on, when the
system was moved into semi-production as the backup storage for the
aforementioned cluster. Other hardware involed is/are Intel Corporation
82598EB 10-Gigabit Ethernet adapters driven by ixgbe, two Adaptec AAC-
RAID controllers driven by aacraid, on an Intel 5520-based dual-socket,
fource-core (HT disabled) Nehalem machine.

We see backtraces like the following:
[150122.133802] INFO: task nfsd:2145 blocked for more than 120 seconds.
[150122.133853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[150122.133934] nfsd          D ffff880001e15880     0  2145      2 0x00000000
[150122.133937]  ffff8806614e1cd0 0000000000000046 ffff8806614e1cd0 ffff8806614e1fd8
[150122.133940]  ffff88065b7dc4a0 0000000000015880 0000000000015880 ffff8806614e1fd8
[150122.133942]  0000000000015880 ffff8806614e1fd8 0000000000015880 ffff88065b7dc4a0
[150122.133945] Call Trace:
[150122.133947]  [<ffffffff81576838>] __mutex_lock_slowpath+0xe8/0x170
[150122.133949]  [<ffffffff8157647b>] mutex_lock+0x2b/0x50
[150122.133954]  [<ffffffffa03e62ff>] nfsd_unlink+0xaf/0x240 [nfsd]
[150122.133960]  [<ffffffffa03edd54>] nfsd3_proc_remove+0x84/0x100 [nfsd]
[150122.133964]  [<ffffffffa03df3fb>] nfsd_dispatch+0xbb/0x210 [nfsd]
[150122.133972]  [<ffffffffa021d625>] svc_process_common+0x325/0x650 [sunrpc]
[150122.133977]  [<ffffffffa03dfa60>] ? nfsd+0x0/0x150 [nfsd]
[150122.133984]  [<ffffffffa021da83>] svc_process+0x133/0x150 [sunrpc]
[150122.133988]  [<ffffffffa03dfb1d>] nfsd+0xbd/0x150 [nfsd]
[150122.133990]  [<ffffffff8107f8d6>] kthread+0x96/0xa0
[150122.133993]  [<ffffffff8100be64>] kernel_thread_helper+0x4/0x10
[150122.133995]  [<ffffffff8107f840>] ? kthread+0x0/0xa0
[150122.133997]  [<ffffffff8100be60>] ? kernel_thread_helper+0x0/0x10

This happened after copying over a few VM images from our primary to our
backup storage over NFS(v3). The machine doesn't crash, but NFS
performance is rather unimpressive during and after these operations.

I'll investigate if Thag's suggested workaround is applicable in our
situation, and if it is, if it helps getting things to work normally.
However, as we're not using multiple IPv4 addresses with our NICs afaik,
I'm on the lookout for alternative solutions to the problem, or theories
what may cause it.

-- 
INFO: task nfsd:5047 blocked for more than 120 seconds.
https://bugs.launchpad.net/bugs/606523
You received this bug notification because you are a member of Kernel
Bugs, which is subscribed to linux in ubuntu.