[Bug 879334] Re: nfsd from nfs-kernel-server very slow and system load from 25%-100% from nfsd

Karsten Suehring 879334 at bugs.launchpad.net
Thu Nov 15 14:37:02 UTC 2012


I'm adding some more test data here:

As a workaround I tried to install an old Ubuntu 2.6 kernel (linux-
image-2.6.35-31-generic_2.6.35-31.63_amd64.deb) into 12.04.1.

I saw a number of locking issues reported and thought these might be
caused by using the kernel in a wrong environment. But now after I have
downgraded the servers back to 10.10 and kept the clients at 12.04.1, I
still see kernel messages like the following:

[ 5474.132324] ------------[ cut here ]------------
[ 5474.132346] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 5474.132349] Hardware name: PowerEdge R710
[ 5474.132351] Modules linked in: ipmi_si mpt2sas raid_class mptctl ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas megaraid_sas [last unloaded: ipmi_si]
[ 5474.132386] Pid: 1746, comm: rpciod/16 Tainted: G        W   2.6.35-32-server #67-Ubuntu
[ 5474.132388] Call Trace:
[ 5474.132399]  [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 5474.132403]  [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 5474.132414]  [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 5474.132426]  [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 5474.132437]  [<ffffffffa016c7f0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[ 5474.132448]  [<ffffffffa016c805>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 5474.132455]  [<ffffffff8107b395>] run_workqueue+0xc5/0x1a0
[ 5474.132460]  [<ffffffff8107b513>] worker_thread+0xa3/0x110
[ 5474.132464]  [<ffffffff810801a0>] ? autoremove_wake_function+0x0/0x40
[ 5474.132468]  [<ffffffff8107b470>] ? worker_thread+0x0/0x110
[ 5474.132472]  [<ffffffff8107fc26>] kthread+0x96/0xa0
[ 5474.132477]  [<ffffffff8100aea4>] kernel_thread_helper+0x4/0x10
[ 5474.132481]  [<ffffffff8107fb90>] ? kthread+0x0/0xa0
[ 5474.132484]  [<ffffffff8100aea0>] ? kernel_thread_helper+0x0/0x10
[ 5474.132487] ---[ end trace 5a3838b115992a79 ]---
[ 6091.800511] ------------[ cut here ]------------
[ 6091.800532] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 6091.800536] Hardware name: PowerEdge R710
[ 6091.800537] Modules linked in: ipmi_si mpt2sas raid_class mptctl ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas megaraid_sas [last unloaded: ipmi_si]
[ 6091.800572] Pid: 1744, comm: rpciod/14 Tainted: G        W   2.6.35-32-server #67-Ubuntu
[ 6091.800575] Call Trace:
[ 6091.800585]  [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 6091.800590]  [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 6091.800601]  [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 6091.800612]  [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 6091.800623]  [<ffffffffa016c7f0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[ 6091.800634]  [<ffffffffa016c805>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 6091.800642]  [<ffffffff8107b395>] run_workqueue+0xc5/0x1a0
[ 6091.800646]  [<ffffffff8107b513>] worker_thread+0xa3/0x110
[ 6091.800650]  [<ffffffff810801a0>] ? autoremove_wake_function+0x0/0x40
[ 6091.800654]  [<ffffffff8107b470>] ? worker_thread+0x0/0x110
[ 6091.800658]  [<ffffffff8107fc26>] kthread+0x96/0xa0
[ 6091.800663]  [<ffffffff8100aea4>] kernel_thread_helper+0x4/0x10
[ 6091.800667]  [<ffffffff8107fb90>] ? kthread+0x0/0xa0
[ 6091.800671]  [<ffffffff8100aea0>] ? kernel_thread_helper+0x0/0x10
[ 6091.800673] ---[ end trace 5a3838b115992a7a ]---

On the client I see:

[ 7061.756411] INFO: task unzip:8081 blocked for more than 120 seconds.
[ 7061.767633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7061.790039] unzip           D 0000000000000007     0  8081   8041 0x00000000
[ 7061.790044]  ffff8805ec807b48 0000000000000086 ffff880500000000 ffffffff00000007
[ 7061.790051]  ffff8805ec807fd8 ffff8805ec807fd8 ffff8805ec807fd8 00000000000137c0
[ 7061.790063]  ffff880608a02e00 ffff8805fb9f1700 ffff8805ec807b28 ffff880617c74080
[ 7061.790075] Call Trace:
[ 7061.790082]  [<ffffffff81117130>] ? __lock_page+0x70/0x70
[ 7061.790090]  [<ffffffff816590ff>] schedule+0x3f/0x60
[ 7061.790097]  [<ffffffff816591af>] io_schedule+0x8f/0xd0
[ 7061.790105]  [<ffffffff8111713e>] sleep_on_page+0xe/0x20
[ 7061.790112]  [<ffffffff816599cf>] __wait_on_bit+0x5f/0x90
[ 7061.790119]  [<ffffffff811172a8>] wait_on_page_bit+0x78/0x80
[ 7061.790127]  [<ffffffff8108acc0>] ? autoremove_wake_function+0x40/0x40
[ 7061.790135]  [<ffffffff811173bc>] filemap_fdatawait_range+0x10c/0x1a0
[ 7061.790144]  [<ffffffff8111747b>] filemap_fdatawait+0x2b/0x30
[ 7061.790151]  [<ffffffff811a17b9>] writeback_single_inode+0x399/0x430
[ 7061.790159]  [<ffffffff811a18ca>] sync_inode+0x7a/0xc0
[ 7061.790169]  [<ffffffffa01a20b3>] nfs_wb_all+0x43/0x50 [nfs]
[ 7061.790177]  [<ffffffffa01937f8>] nfs_setattr+0x138/0x140 [nfs]
[ 7061.790181]  [<ffffffff8119402b>] notify_change+0x1bb/0x360
[ 7061.790185]  [<ffffffff8117617b>] chmod_common+0xbb/0xc0
[ 7061.790189]  [<ffffffff8117d0ba>] ? sys_newstat+0x2a/0x40
[ 7061.790193]  [<ffffffff811770bf>] sys_fchmod+0x4f/0x80
[ 7061.790197]  [<ffffffff81663602>] system_call_fastpath+0x16/0x1b

and the NFS mount hangs. Sometimes the clients are able to recover, but
often they hang completely.

It seems that my initial test on Debian was wrong and the Debian testing
kernels have at least less load on the server.  I cannot comment on the
other issues yet. But it was discussed in the linked Debian bug report
that the above mentioned patch has been removed in their kernels. This
seems to provide at least some positive effect.

Is any Ubuntu kernel developer following this? Could you provide a test
kernel with the patch removed?

I'm currently trying to set up a test environment, but fixing my
production environment has priority :-(

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/879334

Title:
  nfsd from nfs-kernel-server very slow and system load from 25%-100%
  from nfsd

Status in “linux” package in Ubuntu:
  Incomplete
Status in “nfs-utils” package in Ubuntu:
  Confirmed
Status in “linux” package in Debian:
  Unknown

Bug description:
  I have a diskless ubuntu 10.10 machine which I boot regularly using
  pxe-boot from another ubuntu machine where I have the root filesystem
  of the diskless machine exported over nfs.

  I set it up about a year ago using 10.10. In the mean while the server
  machine got upgraded to 11.04 and as of yesterday to 11.10.

  After the upgrade to 11.10 the diskless machine is dead slow (most of
  the times it wont even boot completely) and the load on the server
  machine is high (25%-100% as shown from top). If in the middle of the
  diskless computer booting I do a restart of the nfs server, the client
  computer proceeds with the boot a bit more and then it gets stuck
  again. I have to restart and nfs-server 3-4 times in order to get the
  gdm login screen at the client machine

  ProblemType: Bug
  DistroRelease: Ubuntu 11.10
  Package: nfs-kernel-server 1:1.2.4-1ubuntu2
  ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
  Uname: Linux 3.0.0-12-generic i686
  ApportVersion: 1.23-0ubuntu3
  Architecture: i386
  Date: Fri Oct 21 12:53:02 2011
  ProcEnviron:
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: nfs-utils
  UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/879334/+subscriptions




More information about the foundations-bugs mailing list