Intermittent nfs/kerberos mount failures
Tor Martin Slåen
tormsl at sygard.no
Fri Apr 29 16:11:29 UTC 2011
Hi all!
I'm having some strange issues with my nfs setup, both speed and mount
failures.
The system:
I am running an ldap/kerberos/nfs4 environment with ubuntu 10.04 servers
and clients only.
- one kdap/kerberos(mit) host
- one nfs host
- one dhcp/dns host
- a number of other servers running ldap authentication and nfs4/krb mounts
- a number of clients running the same setup
All machines, which are a member of the ldap domain, mounts users home
directories from the fileserver using kerberos nfs4 shares. The mounting
is done by autofs which gets its mount definitions from the ldap directory.
Most of the time, it all works flawlessly, but every now and then,
machines (clients and servers) starts to lock up when logging in over ssh.
When it happens, all users (except local) cannot get access to their home
directories, and therefore cannot get a shell going. Users can type in
their password, and the MOTD is printed, but it then locks up.
I've been searching the internet up and down while trying a heap of
different proposed solutions, but nothing seems to work.
What I've tried:
- disable firewalls on server and client
- checking that the portmap service is running on the clients and server
- doing portmap checks (rpcinfo -[tu] <server> <program> <version>),
which seems to be working fine both ways (server->client, client->server)
- restarting the nfs-kernel-server on the server and all services
installed by the nfs-common package on the clients
- changing rsize and wsize on the mounts, both are currently set to 4096
- async and sync, wdelay and no_wdelay, intr and no intr exports
- checked network interfaces on server and client, neither are seemingly
reporting any errors
- enabled debugging for nfs on the server and clients, and I cannot see
anything other then these:
* svc: failed to register lockdv1 RPC service (errno 97).
* NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery
directory
* NFSD: starting 90-second grace period
* /export/fileserver/homes and /export/fileserver/homes have same
filehandle for gss/krb5, using first
- one I have a feeling is the source of some issues are this message:
* nslcd[1556]: [a1da7b] nslcd_passwd_byname(nfs/sega.example.com):
invalid user name
- but it shouldn't prevent the shares from being mounted?
- probably a few other this which I have forgotten...
Some configurations:
# fileserver /etc/exports:
/export
gss/krb5(rw,fsid=0,sync,subtree_check,no_root_squash,crossmnt)
/export/fileserver gss/krb5(rw,sync,subtree_check,no_root_squash)
/export/fileserver/homes gss/krb5(rw,no_wdelay,async,no_subtree_check,root_squash,crossmnt)
# Should be noted that the export/fileserver directory is a bind mount to
/fileserver.
# client mount command:
rsize=4096,wsize=4096,hard,intr,noatime,tcp,async,timeo=70,retrans=2,fstype=nfs4,rw,sec=krb5
fileserver.example.com:/fileserver/homes/<username>
; could be a bit mis-formatted as it is copied from the ldap automount cn's
So my questions are;
Is there anything I should check which I haven't already?
Are there anyone who have had the same kind of issues and have figured out
how to fix them?
And just as a notice, it does not seem that setting the RPCGSSDOPTS in
/etc/default/nfs-common works like advertised, as the rpc.gssd process is
launched without any parameters.
Hope someone has some good ideas, because I'm running dry at this point...
--
Thanks,
Tor Martin Slåen
More information about the ubuntu-server
mailing list