[Bug 1315955] Re: nfsd hangs

Matt matts at surveymonkey.com
Sat Jan 23 21:28:31 UTC 2016


We've just run into this after about 600 days of continual NFS use. It
does appear to be load related. I can say anecdotally though that when
swapping over from Ubuntu 14.04 to 14.04.3 we found it appearing more
often, which lead me to this bug report. I've just disabled NFSv4 as was
suggested.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1315955

Title:
  nfsd hangs

Status in NFS-Utils:
  New
Status in nfs-utils package in Ubuntu:
  Incomplete
Status in nfs-utils source package in Trusty:
  Incomplete

Bug description:
  On a relatively busy NFS server, the system hang on us with the
  following messages:

  May  4 07:53:36 wol-nfs kernel: [487678.715589] INFO: task nfsd:2793 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.715653]       Not tainted 3.13.0-24-generic #46-Ubuntu
  May  4 07:53:36 wol-nfs kernel: [487678.715695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  May  4 07:53:36 wol-nfs kernel: [487678.715790] nfsd            D ffff88023fc14440     0  2793      2 0x00000000
  May  4 07:53:36 wol-nfs kernel: [487678.715800]  ffff88023317fca0 0000000000000002 ffff880233268000 ffff88023317ffd8
  May  4 07:53:36 wol-nfs kernel: [487678.715807]  0000000000014440 0000000000014440 ffff880233268000 ffffffffa03520a0
  May  4 07:53:36 wol-nfs kernel: [487678.715811]  ffffffffa03520a4 ffff880233268000 00000000ffffffff ffffffffa03520a8
  May  4 07:53:36 wol-nfs kernel: [487678.715818] Call Trace:
  May  4 07:53:36 wol-nfs kernel: [487678.715860]  [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
  May  4 07:53:36 wol-nfs kernel: [487678.715865]  [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
  May  4 07:53:36 wol-nfs kernel: [487678.715870]  [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
  May  4 07:53:36 wol-nfs kernel: [487678.715905]  [<ffffffffa033be55>] nfs4_lock_state+0x15/0x20 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715917]  [<ffffffffa032e858>] nfsd4_open+0xd8/0x8f0 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715928]  [<ffffffffa032f5da>] nfsd4_proc_compound+0x56a/0x7b0 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715937]  [<ffffffffa031bd2b>] nfsd_dispatch+0xbb/0x200 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715961]  [<ffffffffa026a63d>] svc_process_common+0x46d/0x6d0 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.715977]  [<ffffffffa026a9a7>] svc_process+0x107/0x170 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.715986]  [<ffffffffa031b71f>] nfsd+0xbf/0x130 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715995]  [<ffffffffa031b660>] ? nfsd_destroy+0x80/0x80 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.716004]  [<ffffffff8108b312>] kthread+0xd2/0xf0
  May  4 07:53:36 wol-nfs kernel: [487678.716009]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
  May  4 07:53:36 wol-nfs kernel: [487678.716016]  [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
  May  4 07:53:36 wol-nfs kernel: [487678.716020]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0

  And many more with the exact same stack trace:

  May  4 07:53:36 wol-nfs kernel: [487678.716025] INFO: task nfsd:2794 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.716500] INFO: task nfsd:2795 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.717166] INFO: task nfsd:2796 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.717657] INFO: task nfsd:2797 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.718150] INFO: task nfsd:2798 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.718743] INFO: task nfsd:2799 blocked for more than 120 seconds.

  Except this one

  May  4 07:53:36 wol-nfs kernel: [487678.719229] INFO: task nfsd:2800 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.719347]       Not tainted 3.13.0-24-generic #46-Ubuntu
  May  4 07:53:36 wol-nfs kernel: [487678.719605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  May  4 07:53:36 wol-nfs kernel: [487678.719741] nfsd            D ffff88023fd94440     0  2800      2 0x00000000
  May  4 07:53:36 wol-nfs kernel: [487678.719746]  ffff8800b81f1b40 0000000000000002 ffff88022f96c7d0 ffff8800b81f1fd8
  May  4 07:53:36 wol-nfs kernel: [487678.719751]  0000000000014440 0000000000014440 ffff88022f96c7d0 ffff8800b81f1ca8
  May  4 07:53:36 wol-nfs kernel: [487678.719755]  ffff8800b81f1cb0 7fffffffffffffff ffff88022f96c7d0 ffff8800b81f1c90
  May  4 07:53:36 wol-nfs kernel: [487678.719760] Call Trace:
  May  4 07:53:36 wol-nfs kernel: [487678.719766]  [<ffffffff81719e89>] schedule+0x29/0x70
  May  4 07:53:36 wol-nfs kernel: [487678.719770]  [<ffffffff817190d9>] schedule_timeout+0x239/0x2d0
  May  4 07:53:36 wol-nfs kernel: [487678.719775]  [<ffffffff81719a11>] ? __schedule+0x381/0x7d0
  May  4 07:53:36 wol-nfs kernel: [487678.719781]  [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80
  May  4 07:53:36 wol-nfs kernel: [487678.719786]  [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
  May  4 07:53:36 wol-nfs kernel: [487678.719791]  [<ffffffff8171a9a6>] wait_for_completion+0xa6/0x160
  May  4 07:53:36 wol-nfs kernel: [487678.719798]  [<ffffffff8109a790>] ? wake_up_state+0x20/0x20
  May  4 07:53:36 wol-nfs kernel: [487678.719804]  [<ffffffff810824ca>] flush_workqueue+0x11a/0x5a0
  May  4 07:53:36 wol-nfs kernel: [487678.719818]  [<ffffffffa0346683>] nfsd4_shutdown_callback+0x73/0x80 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719829]  [<ffffffffa033d37d>] destroy_client+0x18d/0x430 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719840]  [<ffffffffa033e9d6>] nfsd4_setclientid_confirm+0x1e6/0x210 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719849]  [<ffffffffa032f5da>] nfsd4_proc_compound+0x56a/0x7b0 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719857]  [<ffffffffa031bd2b>] nfsd_dispatch+0xbb/0x200 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719872]  [<ffffffffa026a63d>] svc_process_common+0x46d/0x6d0 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.719885]  [<ffffffffa026a9a7>] svc_process+0x107/0x170 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.719893]  [<ffffffffa031b71f>] nfsd+0xbf/0x130 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719901]  [<ffffffffa031b660>] ? nfsd_destroy+0x80/0x80 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719905]  [<ffffffff8108b312>] kthread+0xd2/0xf0
  May  4 07:53:36 wol-nfs kernel: [487678.719909]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
  May  4 07:53:36 wol-nfs kernel: [487678.719914]  [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
  May  4 07:53:36 wol-nfs kernel: [487678.719918]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0

  
  It looks like the last thread just hung, keeping a lock and blocking out every single other thread/process of nfsd.

  
  Preceding the crash, there were a few suspicious messages about a CPU soft lockup, with the following stack trace. This may or may not be related. It's days ago though, so it's probably nothing.

  Apr 30 12:45:41 wol-nfs kernel: [159283.910727] BUG: soft lockup - CPU#2 stuck for 22s! [chown:6108]
  Apr 30 12:45:41 wol-nfs kernel: [159283.910928] Call Trace:
  Apr 30 12:45:41 wol-nfs kernel: [159283.910934]  [<ffffffff812085e0>] ? locks_delete_block+0x70/0x80
  Apr 30 12:45:41 wol-nfs kernel: [159283.910937]  [<ffffffff81209f40>] __break_lease+0x350/0x3d0
  Apr 30 12:45:41 wol-nfs kernel: [159283.910940]  [<ffffffff811d5b48>] ? notify_change+0x1a8/0x390
  Apr 30 12:45:41 wol-nfs kernel: [159283.910943]  [<ffffffff811b6767>] chown_common+0x117/0x180
  Apr 30 12:45:41 wol-nfs kernel: [159283.910945]  [<ffffffff811b826f>] SyS_fchownat+0xaf/0x110
  Apr 30 12:45:41 wol-nfs kernel: [159283.910948]  [<ffffffff8172663f>] tracesys+0xe1/0xe6
  Apr 30 12:45:41 wol-nfs kernel: [159283.910949] Code: 39 d0 75 ea b8 01 00 00 00 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 e9 06 00 00 00 66 83 07 02 c3 90 8b 37 f0 66 83 07 02 <f6> 47 02 01 74 f1 55 48 89 e5 e8 31 1b ff ff 5d c3 0f 1f 84 00

  
  The relevant sections of kern.log are in an separate attachment.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-generic 3.13.0.24.29
  ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
  Uname: Linux 3.13.0-24-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  4 23:41 seq
   crw-rw---- 1 root audio 116, 33 May  4 23:41 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  CurrentDmesg:
   [    5.274819] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
   [    5.279871] NFSD: starting 90-second grace period (net ffffffff81cd9b00)
   [    5.518836] init: plymouth-upstart-bridge main process ended, respawning
   [   12.233348] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:91:fc:20:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2
  Date: Mon May  5 00:29:12 2014
  HibernationDevice: RESUME=/dev/mapper/wolnfs--vg-swap_1
  InstallationDate: Installed on 2014-04-20 (14 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
  IwConfig:
   eth0      no wireless extensions.
   
   lo        no wireless extensions.
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  MachineType: VMware, Inc. VMware Virtual Platform
  PciMultimedia:
   
  ProcFB: 0 svgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-24-generic root=/dev/mapper/wolnfs--vg-root ro
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             1.127
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 07/30/2013
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: 440BX Desktop Reference Platform
  dmi.board.vendor: Intel Corporation
  dmi.board.version: None
  dmi.chassis.asset.tag: No Asset Tag
  dmi.chassis.type: 1
  dmi.chassis.vendor: No Enclosure
  dmi.chassis.version: N/A
  dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd07/30/2013:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
  dmi.product.name: VMware Virtual Platform
  dmi.product.version: None
  dmi.sys.vendor: VMware, Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nfs-utils/+bug/1315955/+subscriptions



More information about the foundations-bugs mailing list