[Bug 1315955] Re: nfsd hangs
Matt
matts at surveymonkey.com
Sat Jan 23 21:28:31 UTC 2016
We've just run into this after about 600 days of continual NFS use. It
does appear to be load related. I can say anecdotally though that when
swapping over from Ubuntu 14.04 to 14.04.3 we found it appearing more
often, which lead me to this bug report. I've just disabled NFSv4 as was
suggested.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1315955
Title:
nfsd hangs
Status in NFS-Utils:
New
Status in nfs-utils package in Ubuntu:
Incomplete
Status in nfs-utils source package in Trusty:
Incomplete
Bug description:
On a relatively busy NFS server, the system hang on us with the
following messages:
May 4 07:53:36 wol-nfs kernel: [487678.715589] INFO: task nfsd:2793 blocked for more than 120 seconds.
May 4 07:53:36 wol-nfs kernel: [487678.715653] Not tainted 3.13.0-24-generic #46-Ubuntu
May 4 07:53:36 wol-nfs kernel: [487678.715695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 4 07:53:36 wol-nfs kernel: [487678.715790] nfsd D ffff88023fc14440 0 2793 2 0x00000000
May 4 07:53:36 wol-nfs kernel: [487678.715800] ffff88023317fca0 0000000000000002 ffff880233268000 ffff88023317ffd8
May 4 07:53:36 wol-nfs kernel: [487678.715807] 0000000000014440 0000000000014440 ffff880233268000 ffffffffa03520a0
May 4 07:53:36 wol-nfs kernel: [487678.715811] ffffffffa03520a4 ffff880233268000 00000000ffffffff ffffffffa03520a8
May 4 07:53:36 wol-nfs kernel: [487678.715818] Call Trace:
May 4 07:53:36 wol-nfs kernel: [487678.715860] [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
May 4 07:53:36 wol-nfs kernel: [487678.715865] [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
May 4 07:53:36 wol-nfs kernel: [487678.715870] [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
May 4 07:53:36 wol-nfs kernel: [487678.715905] [<ffffffffa033be55>] nfs4_lock_state+0x15/0x20 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.715917] [<ffffffffa032e858>] nfsd4_open+0xd8/0x8f0 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.715928] [<ffffffffa032f5da>] nfsd4_proc_compound+0x56a/0x7b0 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.715937] [<ffffffffa031bd2b>] nfsd_dispatch+0xbb/0x200 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.715961] [<ffffffffa026a63d>] svc_process_common+0x46d/0x6d0 [sunrpc]
May 4 07:53:36 wol-nfs kernel: [487678.715977] [<ffffffffa026a9a7>] svc_process+0x107/0x170 [sunrpc]
May 4 07:53:36 wol-nfs kernel: [487678.715986] [<ffffffffa031b71f>] nfsd+0xbf/0x130 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.715995] [<ffffffffa031b660>] ? nfsd_destroy+0x80/0x80 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.716004] [<ffffffff8108b312>] kthread+0xd2/0xf0
May 4 07:53:36 wol-nfs kernel: [487678.716009] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
May 4 07:53:36 wol-nfs kernel: [487678.716016] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
May 4 07:53:36 wol-nfs kernel: [487678.716020] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
And many more with the exact same stack trace:
May 4 07:53:36 wol-nfs kernel: [487678.716025] INFO: task nfsd:2794 blocked for more than 120 seconds.
May 4 07:53:36 wol-nfs kernel: [487678.716500] INFO: task nfsd:2795 blocked for more than 120 seconds.
May 4 07:53:36 wol-nfs kernel: [487678.717166] INFO: task nfsd:2796 blocked for more than 120 seconds.
May 4 07:53:36 wol-nfs kernel: [487678.717657] INFO: task nfsd:2797 blocked for more than 120 seconds.
May 4 07:53:36 wol-nfs kernel: [487678.718150] INFO: task nfsd:2798 blocked for more than 120 seconds.
May 4 07:53:36 wol-nfs kernel: [487678.718743] INFO: task nfsd:2799 blocked for more than 120 seconds.
Except this one
May 4 07:53:36 wol-nfs kernel: [487678.719229] INFO: task nfsd:2800 blocked for more than 120 seconds.
May 4 07:53:36 wol-nfs kernel: [487678.719347] Not tainted 3.13.0-24-generic #46-Ubuntu
May 4 07:53:36 wol-nfs kernel: [487678.719605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 4 07:53:36 wol-nfs kernel: [487678.719741] nfsd D ffff88023fd94440 0 2800 2 0x00000000
May 4 07:53:36 wol-nfs kernel: [487678.719746] ffff8800b81f1b40 0000000000000002 ffff88022f96c7d0 ffff8800b81f1fd8
May 4 07:53:36 wol-nfs kernel: [487678.719751] 0000000000014440 0000000000014440 ffff88022f96c7d0 ffff8800b81f1ca8
May 4 07:53:36 wol-nfs kernel: [487678.719755] ffff8800b81f1cb0 7fffffffffffffff ffff88022f96c7d0 ffff8800b81f1c90
May 4 07:53:36 wol-nfs kernel: [487678.719760] Call Trace:
May 4 07:53:36 wol-nfs kernel: [487678.719766] [<ffffffff81719e89>] schedule+0x29/0x70
May 4 07:53:36 wol-nfs kernel: [487678.719770] [<ffffffff817190d9>] schedule_timeout+0x239/0x2d0
May 4 07:53:36 wol-nfs kernel: [487678.719775] [<ffffffff81719a11>] ? __schedule+0x381/0x7d0
May 4 07:53:36 wol-nfs kernel: [487678.719781] [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80
May 4 07:53:36 wol-nfs kernel: [487678.719786] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
May 4 07:53:36 wol-nfs kernel: [487678.719791] [<ffffffff8171a9a6>] wait_for_completion+0xa6/0x160
May 4 07:53:36 wol-nfs kernel: [487678.719798] [<ffffffff8109a790>] ? wake_up_state+0x20/0x20
May 4 07:53:36 wol-nfs kernel: [487678.719804] [<ffffffff810824ca>] flush_workqueue+0x11a/0x5a0
May 4 07:53:36 wol-nfs kernel: [487678.719818] [<ffffffffa0346683>] nfsd4_shutdown_callback+0x73/0x80 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.719829] [<ffffffffa033d37d>] destroy_client+0x18d/0x430 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.719840] [<ffffffffa033e9d6>] nfsd4_setclientid_confirm+0x1e6/0x210 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.719849] [<ffffffffa032f5da>] nfsd4_proc_compound+0x56a/0x7b0 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.719857] [<ffffffffa031bd2b>] nfsd_dispatch+0xbb/0x200 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.719872] [<ffffffffa026a63d>] svc_process_common+0x46d/0x6d0 [sunrpc]
May 4 07:53:36 wol-nfs kernel: [487678.719885] [<ffffffffa026a9a7>] svc_process+0x107/0x170 [sunrpc]
May 4 07:53:36 wol-nfs kernel: [487678.719893] [<ffffffffa031b71f>] nfsd+0xbf/0x130 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.719901] [<ffffffffa031b660>] ? nfsd_destroy+0x80/0x80 [nfsd]
May 4 07:53:36 wol-nfs kernel: [487678.719905] [<ffffffff8108b312>] kthread+0xd2/0xf0
May 4 07:53:36 wol-nfs kernel: [487678.719909] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
May 4 07:53:36 wol-nfs kernel: [487678.719914] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
May 4 07:53:36 wol-nfs kernel: [487678.719918] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
It looks like the last thread just hung, keeping a lock and blocking out every single other thread/process of nfsd.
Preceding the crash, there were a few suspicious messages about a CPU soft lockup, with the following stack trace. This may or may not be related. It's days ago though, so it's probably nothing.
Apr 30 12:45:41 wol-nfs kernel: [159283.910727] BUG: soft lockup - CPU#2 stuck for 22s! [chown:6108]
Apr 30 12:45:41 wol-nfs kernel: [159283.910928] Call Trace:
Apr 30 12:45:41 wol-nfs kernel: [159283.910934] [<ffffffff812085e0>] ? locks_delete_block+0x70/0x80
Apr 30 12:45:41 wol-nfs kernel: [159283.910937] [<ffffffff81209f40>] __break_lease+0x350/0x3d0
Apr 30 12:45:41 wol-nfs kernel: [159283.910940] [<ffffffff811d5b48>] ? notify_change+0x1a8/0x390
Apr 30 12:45:41 wol-nfs kernel: [159283.910943] [<ffffffff811b6767>] chown_common+0x117/0x180
Apr 30 12:45:41 wol-nfs kernel: [159283.910945] [<ffffffff811b826f>] SyS_fchownat+0xaf/0x110
Apr 30 12:45:41 wol-nfs kernel: [159283.910948] [<ffffffff8172663f>] tracesys+0xe1/0xe6
Apr 30 12:45:41 wol-nfs kernel: [159283.910949] Code: 39 d0 75 ea b8 01 00 00 00 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 e9 06 00 00 00 66 83 07 02 c3 90 8b 37 f0 66 83 07 02 <f6> 47 02 01 74 f1 55 48 89 e5 e8 31 1b ff ff 5d c3 0f 1f 84 00
The relevant sections of kern.log are in an separate attachment.
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-generic 3.13.0.24.29
ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 May 4 23:41 seq
crw-rw---- 1 root audio 116, 33 May 4 23:41 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
CurrentDmesg:
[ 5.274819] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 5.279871] NFSD: starting 90-second grace period (net ffffffff81cd9b00)
[ 5.518836] init: plymouth-upstart-bridge main process ended, respawning
[ 12.233348] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:91:fc:20:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2
Date: Mon May 5 00:29:12 2014
HibernationDevice: RESUME=/dev/mapper/wolnfs--vg-swap_1
InstallationDate: Installed on 2014-04-20 (14 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
IwConfig:
eth0 no wireless extensions.
lo no wireless extensions.
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
MachineType: VMware, Inc. VMware Virtual Platform
PciMultimedia:
ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-24-generic root=/dev/mapper/wolnfs--vg-root ro
RelatedPackageVersions:
linux-restricted-modules-3.13.0-24-generic N/A
linux-backports-modules-3.13.0-24-generic N/A
linux-firmware 1.127
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/30/2013
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd07/30/2013:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nfs-utils/+bug/1315955/+subscriptions
More information about the foundations-bugs
mailing list