[Bug 964750] [NEW] nfs/rpc.statd becomes unresponsive

hewbert 964750 at bugs.launchpad.net
Sun Mar 25 20:03:13 UTC 2012


Public bug reported:

We've tested this on: Ubuntu 10.04 LTS, 11.10, and Debian 6.0.4, all on
x64, current updates, and a pretty vanilla installation.

Condition summary:
We have ~1600 users with "live" network homes, all Mac clients.  There's typically around 150 simultaneous connections.  Under these conditions, we can reliably get NFS to become unresponsive within a couple of hours, just by logging in ~150 users and opening Word (for example).  There's no clear indication on what exactly causes the failures.  Being Mac clients 10.6 and below, these are using NFS3.

We've tested using one physical server, with a hardware RAID and an ext3
filesystem.  We've also tested on two separate VMs with ext4.  All
systems in question used LVM.

Here's what our server logs indicate when the failures happen:
Mar 23 15:40:47 debfs mountd[2365]: authenticated mount request from 172.30.109.132:1020 for /srv/homes (/srv/homes)
Mar 23 15:40:58 debfs mountd[2365]: authenticated mount request from 172.30.109.73:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:06 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
Mar 23 15:41:06 debfs mountd[2365]: authenticated mount request from 172.30.109.27:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:09 debfs mountd[2365]: authenticated mount request from 172.30.109.63:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:14 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
** Mar 23 15:41:19 debfs kernel: [ 8395.736310] statd: server rpc.statd not responding, timed out
** Mar 23 15:41:19 debfs kernel: [ 8395.736331] lockd: cannot unmonitor hs13406s4354.dsdk12.schoollocal
Mar 23 15:41:38 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
Mar 23 15:41:52 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
Mar 23 15:41:54 debfs kernel: [ 8430.737038] statd: server rpc.statd not responding, timed out
Mar 23 15:41:54 debfs kernel: [ 8430.737054] lockd: cannot unmonitor hslib23s5174.dsdk12.schoollocal
Mar 23 15:42:10 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
Mar 23 15:42:15 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
Mar 23 15:42:29 debfs kernel: [ 8465.737071] statd: server rpc.statd not responding, timed out
Mar 23 15:42:29 debfs kernel: [ 8465.737090] lockd: cannot unmonitor MS20603S4451.dsdk12.schoollocal
Mar 23 15:42:31 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20
Mar 23 15:42:40 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20

Upon closer examination, the [lockd] process shows a 'D' state when this
is going on.  Usually, my only recourse is to reboot the server.

I've tried different values for RPCNFSDCOUNT in /etc/default/nfs-kernel-
server, and have NEED_STATD=yes in /etc/default/nfs-common.  Otherwise,
everything is pretty well stock.

Here's the /etc/exports:
/srv/homes	172.30.0.0/16(insecure_locks,insecure,rw,sync,no_root_squash,no_subtree_check)
I've tested with 'insecure_locks' and without.  The 'insecure' option is to make the Mac clients happy.

Unfortunately, modifying the NFS options on the clients would be rather
difficult in our environment.

More information can be provided as needed.

** Affects: nfs-utils (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/964750

Title:
  nfs/rpc.statd becomes unresponsive

Status in “nfs-utils” package in Ubuntu:
  New

Bug description:
  We've tested this on: Ubuntu 10.04 LTS, 11.10, and Debian 6.0.4, all
  on x64, current updates, and a pretty vanilla installation.

  Condition summary:
  We have ~1600 users with "live" network homes, all Mac clients.  There's typically around 150 simultaneous connections.  Under these conditions, we can reliably get NFS to become unresponsive within a couple of hours, just by logging in ~150 users and opening Word (for example).  There's no clear indication on what exactly causes the failures.  Being Mac clients 10.6 and below, these are using NFS3.

  We've tested using one physical server, with a hardware RAID and an
  ext3 filesystem.  We've also tested on two separate VMs with ext4.
  All systems in question used LVM.

  Here's what our server logs indicate when the failures happen:
  Mar 23 15:40:47 debfs mountd[2365]: authenticated mount request from 172.30.109.132:1020 for /srv/homes (/srv/homes)
  Mar 23 15:40:58 debfs mountd[2365]: authenticated mount request from 172.30.109.73:1020 for /srv/homes (/srv/homes)
  Mar 23 15:41:06 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
  Mar 23 15:41:06 debfs mountd[2365]: authenticated mount request from 172.30.109.27:1020 for /srv/homes (/srv/homes)
  Mar 23 15:41:09 debfs mountd[2365]: authenticated mount request from 172.30.109.63:1020 for /srv/homes (/srv/homes)
  Mar 23 15:41:14 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
  ** Mar 23 15:41:19 debfs kernel: [ 8395.736310] statd: server rpc.statd not responding, timed out
  ** Mar 23 15:41:19 debfs kernel: [ 8395.736331] lockd: cannot unmonitor hs13406s4354.dsdk12.schoollocal
  Mar 23 15:41:38 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
  Mar 23 15:41:52 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
  Mar 23 15:41:54 debfs kernel: [ 8430.737038] statd: server rpc.statd not responding, timed out
  Mar 23 15:41:54 debfs kernel: [ 8430.737054] lockd: cannot unmonitor hslib23s5174.dsdk12.schoollocal
  Mar 23 15:42:10 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
  Mar 23 15:42:15 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
  Mar 23 15:42:29 debfs kernel: [ 8465.737071] statd: server rpc.statd not responding, timed out
  Mar 23 15:42:29 debfs kernel: [ 8465.737090] lockd: cannot unmonitor MS20603S4451.dsdk12.schoollocal
  Mar 23 15:42:31 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20
  Mar 23 15:42:40 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20

  Upon closer examination, the [lockd] process shows a 'D' state when
  this is going on.  Usually, my only recourse is to reboot the server.

  I've tried different values for RPCNFSDCOUNT in /etc/default/nfs-
  kernel-server, and have NEED_STATD=yes in /etc/default/nfs-common.
  Otherwise, everything is pretty well stock.

  Here's the /etc/exports:
  /srv/homes	172.30.0.0/16(insecure_locks,insecure,rw,sync,no_root_squash,no_subtree_check)
  I've tested with 'insecure_locks' and without.  The 'insecure' option is to make the Mac clients happy.

  Unfortunately, modifying the NFS options on the clients would be
  rather difficult in our environment.

  More information can be provided as needed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/964750/+subscriptions




More information about the foundations-bugs mailing list