[Bug 964750] [NEW] nfs/rpc.statd becomes unresponsive
hewbert
964750 at bugs.launchpad.net
Sun Mar 25 20:03:13 UTC 2012
Public bug reported:
We've tested this on: Ubuntu 10.04 LTS, 11.10, and Debian 6.0.4, all on
x64, current updates, and a pretty vanilla installation.
Condition summary:
We have ~1600 users with "live" network homes, all Mac clients. There's typically around 150 simultaneous connections. Under these conditions, we can reliably get NFS to become unresponsive within a couple of hours, just by logging in ~150 users and opening Word (for example). There's no clear indication on what exactly causes the failures. Being Mac clients 10.6 and below, these are using NFS3.
We've tested using one physical server, with a hardware RAID and an ext3
filesystem. We've also tested on two separate VMs with ext4. All
systems in question used LVM.
Here's what our server logs indicate when the failures happen:
Mar 23 15:40:47 debfs mountd[2365]: authenticated mount request from 172.30.109.132:1020 for /srv/homes (/srv/homes)
Mar 23 15:40:58 debfs mountd[2365]: authenticated mount request from 172.30.109.73:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:06 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
Mar 23 15:41:06 debfs mountd[2365]: authenticated mount request from 172.30.109.27:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:09 debfs mountd[2365]: authenticated mount request from 172.30.109.63:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:14 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
** Mar 23 15:41:19 debfs kernel: [ 8395.736310] statd: server rpc.statd not responding, timed out
** Mar 23 15:41:19 debfs kernel: [ 8395.736331] lockd: cannot unmonitor hs13406s4354.dsdk12.schoollocal
Mar 23 15:41:38 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
Mar 23 15:41:52 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
Mar 23 15:41:54 debfs kernel: [ 8430.737038] statd: server rpc.statd not responding, timed out
Mar 23 15:41:54 debfs kernel: [ 8430.737054] lockd: cannot unmonitor hslib23s5174.dsdk12.schoollocal
Mar 23 15:42:10 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
Mar 23 15:42:15 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
Mar 23 15:42:29 debfs kernel: [ 8465.737071] statd: server rpc.statd not responding, timed out
Mar 23 15:42:29 debfs kernel: [ 8465.737090] lockd: cannot unmonitor MS20603S4451.dsdk12.schoollocal
Mar 23 15:42:31 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20
Mar 23 15:42:40 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20
Upon closer examination, the [lockd] process shows a 'D' state when this
is going on. Usually, my only recourse is to reboot the server.
I've tried different values for RPCNFSDCOUNT in /etc/default/nfs-kernel-
server, and have NEED_STATD=yes in /etc/default/nfs-common. Otherwise,
everything is pretty well stock.
Here's the /etc/exports:
/srv/homes 172.30.0.0/16(insecure_locks,insecure,rw,sync,no_root_squash,no_subtree_check)
I've tested with 'insecure_locks' and without. The 'insecure' option is to make the Mac clients happy.
Unfortunately, modifying the NFS options on the clients would be rather
difficult in our environment.
More information can be provided as needed.
** Affects: nfs-utils (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/964750
Title:
nfs/rpc.statd becomes unresponsive
Status in “nfs-utils” package in Ubuntu:
New
Bug description:
We've tested this on: Ubuntu 10.04 LTS, 11.10, and Debian 6.0.4, all
on x64, current updates, and a pretty vanilla installation.
Condition summary:
We have ~1600 users with "live" network homes, all Mac clients. There's typically around 150 simultaneous connections. Under these conditions, we can reliably get NFS to become unresponsive within a couple of hours, just by logging in ~150 users and opening Word (for example). There's no clear indication on what exactly causes the failures. Being Mac clients 10.6 and below, these are using NFS3.
We've tested using one physical server, with a hardware RAID and an
ext3 filesystem. We've also tested on two separate VMs with ext4.
All systems in question used LVM.
Here's what our server logs indicate when the failures happen:
Mar 23 15:40:47 debfs mountd[2365]: authenticated mount request from 172.30.109.132:1020 for /srv/homes (/srv/homes)
Mar 23 15:40:58 debfs mountd[2365]: authenticated mount request from 172.30.109.73:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:06 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
Mar 23 15:41:06 debfs mountd[2365]: authenticated mount request from 172.30.109.27:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:09 debfs mountd[2365]: authenticated mount request from 172.30.109.63:1020 for /srv/homes (/srv/homes)
Mar 23 15:41:14 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.106.249
** Mar 23 15:41:19 debfs kernel: [ 8395.736310] statd: server rpc.statd not responding, timed out
** Mar 23 15:41:19 debfs kernel: [ 8395.736331] lockd: cannot unmonitor hs13406s4354.dsdk12.schoollocal
Mar 23 15:41:38 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
Mar 23 15:41:52 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.137.223
Mar 23 15:41:54 debfs kernel: [ 8430.737038] statd: server rpc.statd not responding, timed out
Mar 23 15:41:54 debfs kernel: [ 8430.737054] lockd: cannot unmonitor hslib23s5174.dsdk12.schoollocal
Mar 23 15:42:10 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
Mar 23 15:42:15 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.25
Mar 23 15:42:29 debfs kernel: [ 8465.737071] statd: server rpc.statd not responding, timed out
Mar 23 15:42:29 debfs kernel: [ 8465.737090] lockd: cannot unmonitor MS20603S4451.dsdk12.schoollocal
Mar 23 15:42:31 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20
Mar 23 15:42:40 debfs rpc.statd[741]: Received erroneous SM_UNMON request from debfs for 172.30.110.20
Upon closer examination, the [lockd] process shows a 'D' state when
this is going on. Usually, my only recourse is to reboot the server.
I've tried different values for RPCNFSDCOUNT in /etc/default/nfs-
kernel-server, and have NEED_STATD=yes in /etc/default/nfs-common.
Otherwise, everything is pretty well stock.
Here's the /etc/exports:
/srv/homes 172.30.0.0/16(insecure_locks,insecure,rw,sync,no_root_squash,no_subtree_check)
I've tested with 'insecure_locks' and without. The 'insecure' option is to make the Mac clients happy.
Unfortunately, modifying the NFS options on the clients would be
rather difficult in our environment.
More information can be provided as needed.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/964750/+subscriptions
More information about the foundations-bugs
mailing list