Intermittent nfs/kerberos mount failures

Tor Martin Slåen tormsl at sygard.no
Fri Apr 29 16:11:29 UTC 2011


Hi all!

I'm having some strange issues with my nfs setup, both speed and mount  
failures.

The system:
I am running an ldap/kerberos/nfs4 environment with ubuntu 10.04 servers  
and clients only.
- one kdap/kerberos(mit) host
- one nfs host
- one dhcp/dns host
- a number of other servers running ldap authentication and nfs4/krb mounts
- a number of clients running the same setup

All machines, which are a member of the ldap domain, mounts users home  
directories from the fileserver using kerberos nfs4 shares. The mounting  
is done by autofs which gets its mount definitions from the ldap directory.


Most of the time, it all works flawlessly, but every now and then,  
machines (clients and servers) starts to lock up when logging in over ssh.  
When it happens, all users (except local) cannot get access to their home  
directories, and therefore cannot get a shell going. Users can type in  
their password, and the MOTD is printed, but it then locks up.

I've been searching the internet up and down while trying a heap of  
different proposed solutions, but nothing seems to work.

What I've tried:
  - disable firewalls on server and client
  - checking that the portmap service is running on the clients and server
  - doing portmap checks (rpcinfo -[tu] <server> <program> <version>),  
which seems to be working fine both ways (server->client, client->server)
  - restarting the nfs-kernel-server on the server and all services  
installed by the nfs-common package on the clients
  - changing rsize and wsize on the mounts, both are currently set to 4096
  - async and sync, wdelay and no_wdelay, intr and no intr exports
  - checked network interfaces on server and client, neither are seemingly  
reporting any errors
  - enabled debugging for nfs on the server and clients, and I cannot see  
anything other then these:
    * svc: failed to register lockdv1 RPC service (errno 97).
    * NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery  
directory
    * NFSD: starting 90-second grace period
    * /export/fileserver/homes and /export/fileserver/homes have same  
filehandle for gss/krb5, using first
  - one I have a feeling is the source of some issues are this message:
    * nslcd[1556]: [a1da7b] nslcd_passwd_byname(nfs/sega.example.com):  
invalid user name
       - but it shouldn't prevent the shares from being mounted?
  - probably a few other this which I have forgotten...

Some configurations:
# fileserver /etc/exports:

/export          
gss/krb5(rw,fsid=0,sync,subtree_check,no_root_squash,crossmnt)
/export/fileserver    gss/krb5(rw,sync,subtree_check,no_root_squash)
/export/fileserver/homes	gss/krb5(rw,no_wdelay,async,no_subtree_check,root_squash,crossmnt)

# Should be noted that the export/fileserver directory is a bind mount to  
/fileserver.

# client mount command:
rsize=4096,wsize=4096,hard,intr,noatime,tcp,async,timeo=70,retrans=2,fstype=nfs4,rw,sec=krb5  
fileserver.example.com:/fileserver/homes/<username>
; could be a bit mis-formatted as it is copied from the ldap automount cn's


So my questions are;
Is there anything I should check which I haven't already?
Are there anyone who have had the same kind of issues and have figured out  
how to fix them?


And just as a notice, it does not seem that setting the RPCGSSDOPTS in  
/etc/default/nfs-common works like advertised, as the rpc.gssd process is  
launched without any parameters.


Hope someone has some good ideas, because I'm running dry at this point...


-- 
Thanks,
Tor Martin Slåen




More information about the ubuntu-server mailing list