[Bug 1315955] Re: nfsd hangs

Moritz Augustin pub at moritz-augustin.de
Tue Jun 17 11:59:18 UTC 2014


I can confirm this bug with Ubuntu 14.04 LTS and would appreciate any workaround since this is hurting me alot in my production environment (8 servers) which I have updated to the current LTS assuming stable core packets like nfs related ones... 
If you need more details please let me know.

** Attachment added: "dmesg output"
   https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1315955/+attachment/4133217/+files/dmesg.txt

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1315955

Title:
  nfsd hangs

Status in “nfs-utils” package in Ubuntu:
  Confirmed

Bug description:
  On a relatively busy NFS server, the system hang on us with the
  following messages:

  May  4 07:53:36 wol-nfs kernel: [487678.715589] INFO: task nfsd:2793 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.715653]       Not tainted 3.13.0-24-generic #46-Ubuntu
  May  4 07:53:36 wol-nfs kernel: [487678.715695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  May  4 07:53:36 wol-nfs kernel: [487678.715790] nfsd            D ffff88023fc14440     0  2793      2 0x00000000
  May  4 07:53:36 wol-nfs kernel: [487678.715800]  ffff88023317fca0 0000000000000002 ffff880233268000 ffff88023317ffd8
  May  4 07:53:36 wol-nfs kernel: [487678.715807]  0000000000014440 0000000000014440 ffff880233268000 ffffffffa03520a0
  May  4 07:53:36 wol-nfs kernel: [487678.715811]  ffffffffa03520a4 ffff880233268000 00000000ffffffff ffffffffa03520a8
  May  4 07:53:36 wol-nfs kernel: [487678.715818] Call Trace:
  May  4 07:53:36 wol-nfs kernel: [487678.715860]  [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
  May  4 07:53:36 wol-nfs kernel: [487678.715865]  [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
  May  4 07:53:36 wol-nfs kernel: [487678.715870]  [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
  May  4 07:53:36 wol-nfs kernel: [487678.715905]  [<ffffffffa033be55>] nfs4_lock_state+0x15/0x20 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715917]  [<ffffffffa032e858>] nfsd4_open+0xd8/0x8f0 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715928]  [<ffffffffa032f5da>] nfsd4_proc_compound+0x56a/0x7b0 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715937]  [<ffffffffa031bd2b>] nfsd_dispatch+0xbb/0x200 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715961]  [<ffffffffa026a63d>] svc_process_common+0x46d/0x6d0 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.715977]  [<ffffffffa026a9a7>] svc_process+0x107/0x170 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.715986]  [<ffffffffa031b71f>] nfsd+0xbf/0x130 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.715995]  [<ffffffffa031b660>] ? nfsd_destroy+0x80/0x80 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.716004]  [<ffffffff8108b312>] kthread+0xd2/0xf0
  May  4 07:53:36 wol-nfs kernel: [487678.716009]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
  May  4 07:53:36 wol-nfs kernel: [487678.716016]  [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
  May  4 07:53:36 wol-nfs kernel: [487678.716020]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0

  And many more with the exact same stack trace:

  May  4 07:53:36 wol-nfs kernel: [487678.716025] INFO: task nfsd:2794 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.716500] INFO: task nfsd:2795 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.717166] INFO: task nfsd:2796 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.717657] INFO: task nfsd:2797 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.718150] INFO: task nfsd:2798 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.718743] INFO: task nfsd:2799 blocked for more than 120 seconds.

  Except this one

  May  4 07:53:36 wol-nfs kernel: [487678.719229] INFO: task nfsd:2800 blocked for more than 120 seconds.
  May  4 07:53:36 wol-nfs kernel: [487678.719347]       Not tainted 3.13.0-24-generic #46-Ubuntu
  May  4 07:53:36 wol-nfs kernel: [487678.719605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  May  4 07:53:36 wol-nfs kernel: [487678.719741] nfsd            D ffff88023fd94440     0  2800      2 0x00000000
  May  4 07:53:36 wol-nfs kernel: [487678.719746]  ffff8800b81f1b40 0000000000000002 ffff88022f96c7d0 ffff8800b81f1fd8
  May  4 07:53:36 wol-nfs kernel: [487678.719751]  0000000000014440 0000000000014440 ffff88022f96c7d0 ffff8800b81f1ca8
  May  4 07:53:36 wol-nfs kernel: [487678.719755]  ffff8800b81f1cb0 7fffffffffffffff ffff88022f96c7d0 ffff8800b81f1c90
  May  4 07:53:36 wol-nfs kernel: [487678.719760] Call Trace:
  May  4 07:53:36 wol-nfs kernel: [487678.719766]  [<ffffffff81719e89>] schedule+0x29/0x70
  May  4 07:53:36 wol-nfs kernel: [487678.719770]  [<ffffffff817190d9>] schedule_timeout+0x239/0x2d0
  May  4 07:53:36 wol-nfs kernel: [487678.719775]  [<ffffffff81719a11>] ? __schedule+0x381/0x7d0
  May  4 07:53:36 wol-nfs kernel: [487678.719781]  [<ffffffff8101b763>] ? native_sched_clock+0x13/0x80
  May  4 07:53:36 wol-nfs kernel: [487678.719786]  [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
  May  4 07:53:36 wol-nfs kernel: [487678.719791]  [<ffffffff8171a9a6>] wait_for_completion+0xa6/0x160
  May  4 07:53:36 wol-nfs kernel: [487678.719798]  [<ffffffff8109a790>] ? wake_up_state+0x20/0x20
  May  4 07:53:36 wol-nfs kernel: [487678.719804]  [<ffffffff810824ca>] flush_workqueue+0x11a/0x5a0
  May  4 07:53:36 wol-nfs kernel: [487678.719818]  [<ffffffffa0346683>] nfsd4_shutdown_callback+0x73/0x80 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719829]  [<ffffffffa033d37d>] destroy_client+0x18d/0x430 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719840]  [<ffffffffa033e9d6>] nfsd4_setclientid_confirm+0x1e6/0x210 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719849]  [<ffffffffa032f5da>] nfsd4_proc_compound+0x56a/0x7b0 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719857]  [<ffffffffa031bd2b>] nfsd_dispatch+0xbb/0x200 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719872]  [<ffffffffa026a63d>] svc_process_common+0x46d/0x6d0 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.719885]  [<ffffffffa026a9a7>] svc_process+0x107/0x170 [sunrpc]
  May  4 07:53:36 wol-nfs kernel: [487678.719893]  [<ffffffffa031b71f>] nfsd+0xbf/0x130 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719901]  [<ffffffffa031b660>] ? nfsd_destroy+0x80/0x80 [nfsd]
  May  4 07:53:36 wol-nfs kernel: [487678.719905]  [<ffffffff8108b312>] kthread+0xd2/0xf0
  May  4 07:53:36 wol-nfs kernel: [487678.719909]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
  May  4 07:53:36 wol-nfs kernel: [487678.719914]  [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
  May  4 07:53:36 wol-nfs kernel: [487678.719918]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0

  
  It looks like the last thread just hung, keeping a lock and blocking out every single other thread/process of nfsd.

  
  Preceding the crash, there were a few suspicious messages about a CPU soft lockup, with the following stack trace. This may or may not be related. It's days ago though, so it's probably nothing.

  Apr 30 12:45:41 wol-nfs kernel: [159283.910727] BUG: soft lockup - CPU#2 stuck for 22s! [chown:6108]
  Apr 30 12:45:41 wol-nfs kernel: [159283.910928] Call Trace:
  Apr 30 12:45:41 wol-nfs kernel: [159283.910934]  [<ffffffff812085e0>] ? locks_delete_block+0x70/0x80
  Apr 30 12:45:41 wol-nfs kernel: [159283.910937]  [<ffffffff81209f40>] __break_lease+0x350/0x3d0
  Apr 30 12:45:41 wol-nfs kernel: [159283.910940]  [<ffffffff811d5b48>] ? notify_change+0x1a8/0x390
  Apr 30 12:45:41 wol-nfs kernel: [159283.910943]  [<ffffffff811b6767>] chown_common+0x117/0x180
  Apr 30 12:45:41 wol-nfs kernel: [159283.910945]  [<ffffffff811b826f>] SyS_fchownat+0xaf/0x110
  Apr 30 12:45:41 wol-nfs kernel: [159283.910948]  [<ffffffff8172663f>] tracesys+0xe1/0xe6
  Apr 30 12:45:41 wol-nfs kernel: [159283.910949] Code: 39 d0 75 ea b8 01 00 00 00 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 e9 06 00 00 00 66 83 07 02 c3 90 8b 37 f0 66 83 07 02 <f6> 47 02 01 74 f1 55 48 89 e5 e8 31 1b ff ff 5d c3 0f 1f 84 00

  
  The relevant sections of kern.log are in an separate attachment.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-generic 3.13.0.24.29
  ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
  Uname: Linux 3.13.0-24-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  4 23:41 seq
   crw-rw---- 1 root audio 116, 33 May  4 23:41 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  CurrentDmesg:
   [    5.274819] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
   [    5.279871] NFSD: starting 90-second grace period (net ffffffff81cd9b00)
   [    5.518836] init: plymouth-upstart-bridge main process ended, respawning
   [   12.233348] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:91:fc:20:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2
  Date: Mon May  5 00:29:12 2014
  HibernationDevice: RESUME=/dev/mapper/wolnfs--vg-swap_1
  InstallationDate: Installed on 2014-04-20 (14 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
  IwConfig:
   eth0      no wireless extensions.
   
   lo        no wireless extensions.
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  MachineType: VMware, Inc. VMware Virtual Platform
  PciMultimedia:
   
  ProcFB: 0 svgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-24-generic root=/dev/mapper/wolnfs--vg-root ro
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             1.127
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 07/30/2013
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: 440BX Desktop Reference Platform
  dmi.board.vendor: Intel Corporation
  dmi.board.version: None
  dmi.chassis.asset.tag: No Asset Tag
  dmi.chassis.type: 1
  dmi.chassis.vendor: No Enclosure
  dmi.chassis.version: N/A
  dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd07/30/2013:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
  dmi.product.name: VMware Virtual Platform
  dmi.product.version: None
  dmi.sys.vendor: VMware, Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1315955/+subscriptions



More information about the foundations-bugs mailing list