Ubuntu 6.06 LTS NFS + NetApp filer problem: NFS locks up

jaco engelbrecht bje at serendipity.org.za
Mon Oct 15 10:59:08 UTC 2007


Hi,

I have a cluster of Ubuntu Dapper 6.06 LTS servers acting as  
toasters, serving POP3, IMAP, IMAP proxy, webmail for a large mail  
cluster.

During the last couple of weeks, I've had two serves in the cluster  
unexpectedly stop doing any NFS operations to one of our NetApp  
filers.  Each server mounts three separate NetApp filers, and we've  
got over 30 servers in this cluster mounting the same filers, and  
running with exactly the same configuration (we kickstart all our  
servers, so they're exactly the same, some just doing other tasks).

I see these errors in my syslog, but I suspect (based on graphs for  
at what time I/O wait went up, and at what time these entries appears  
in the logs) that they only appear here after the problem has already  
started (because it can't do a stat on the NFS fs), so I doubt  
they're of any use in solving my problem.

Oct 11 12:33:34 toaster01-mail kernel: [863316.270699] nfs_statfs:  
statfs error = 512
Oct 11 12:33:37 toaster01-mail kernel: [863319.113889] nfs_statfs:  
statfs error = 512
Oct 11 12:34:32 toaster01-mail kernel: [863374.713564] nfs_statfs:  
statfs error = 512
Oct 11 12:34:35 toaster01-mail kernel: [863377.134229] nfs_statfs:  
statfs error = 512
Oct 11 12:35:36 toaster01-mail kernel: [863438.108434] nfs_statfs:  
statfs error = 512
Oct 11 12:35:45 toaster01-mail kernel: [863447.372073] nfs_statfs:  
statfs error = 512
Oct 11 12:46:44 toaster01-mail kernel: [864105.539485] nfs_statfs:  
statfs error = 512

I've checked each network interface from eth1 (backend storage  
interface) through to the fibre gig port that connects to the NetApp,  
and there's no errors, collisions, etc.  I've also read the 32Mb  
Netapp filer report, and can't spot anything unusual there.  Also  
tried doing a strace for say listing files on /mailspool/mail8, but  
just get ... "lstat("/mailspool/mail8",  <unfinished ...>".

I've got one server currently out of service, that's experiencing  
this "condition", so I can get any stats/output from this server if  
there's anything anyone can think of?

Here's some details:-

Kernel:

2.6.15-27-amd64-k8
linux-image-2.6.15-27-amd64-server           
2.6.15-27.50                 Linux kernel image for version 2.6.15 on  
Ser

NFS client:

nfs-common                                   
1.0.7-3ubuntu2               NFS support files common to client and  
serve

Filer mount options:

10.1.25.212:/vol/vol0/mail8-export      /mailspool/mail8        nfs  
rw,hard,intr,timeo=600,retrans=2,rsize=32768,wsize=32768 0 0

eth1 interface output:

eth1: negotiated 100baseTx-FD, link ok

eth1      Link encap:Ethernet  HWaddr 00:17:08:50:78:41
           inet addr:10.1.25.141  Bcast:10.1.25.255  Mask:255.255.255.0
           inet6 addr: fe80::217:8ff:fe50:7841/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:1468254714 errors:0 dropped:0 overruns:0 frame:0
           TX packets:995375684 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:1677649220750 (1.5 TiB)  TX bytes:155688040628  
(144.9 GiB)
           Interrupt:193

nfsstat output:

Client rpc stats:
calls      retrans    authrefrsh
460431933   104        0
Client nfs v2:
null       getattr    setattr    root       lookup     readlink
0       0% 0       0% 0       0% 0       0% 0       0% 0       0%
read       wrcache    write      create     remove     rename
0       0% 0       0% 0       0% 0       0% 0       0% 0       0%
link       symlink    mkdir      rmdir      readdir    fsstat
0       0% 0       0% 0       0% 0       0% 0       0% 0       0%

Client nfs v3:
null       getattr    setattr    lookup     access     readlink
0       0% 114645897 24% 2066807  0% 78704007 17% 151704157 32%  
0       0%
read       write      create     mkdir      symlink    mknod
53366969 11% 11421861  2% 6827536  1% 8280    0% 0       0% 0       0%
remove     rmdir      rename     link       readdir    readdirplus
10806338  2% 233     0% 2897780  0% 3458142  0% 5434425  1% 19085492  4%
fsstat     fsinfo     pathconf   commit
68      0% 4       0% 0       0% 3933    0%

Any ideas, things to try?

Cheers,
Jaco

-- 
bje at serendipity.org.za
the faculty of making fortunate discoveries




More information about the ubuntu-server mailing list