Ubuntu 6.06 LTS NFS + NetApp filer problem: NFS locks up
jaco engelbrecht
bje at serendipity.org.za
Mon Oct 15 10:59:08 UTC 2007
Hi,
I have a cluster of Ubuntu Dapper 6.06 LTS servers acting as
toasters, serving POP3, IMAP, IMAP proxy, webmail for a large mail
cluster.
During the last couple of weeks, I've had two serves in the cluster
unexpectedly stop doing any NFS operations to one of our NetApp
filers. Each server mounts three separate NetApp filers, and we've
got over 30 servers in this cluster mounting the same filers, and
running with exactly the same configuration (we kickstart all our
servers, so they're exactly the same, some just doing other tasks).
I see these errors in my syslog, but I suspect (based on graphs for
at what time I/O wait went up, and at what time these entries appears
in the logs) that they only appear here after the problem has already
started (because it can't do a stat on the NFS fs), so I doubt
they're of any use in solving my problem.
Oct 11 12:33:34 toaster01-mail kernel: [863316.270699] nfs_statfs:
statfs error = 512
Oct 11 12:33:37 toaster01-mail kernel: [863319.113889] nfs_statfs:
statfs error = 512
Oct 11 12:34:32 toaster01-mail kernel: [863374.713564] nfs_statfs:
statfs error = 512
Oct 11 12:34:35 toaster01-mail kernel: [863377.134229] nfs_statfs:
statfs error = 512
Oct 11 12:35:36 toaster01-mail kernel: [863438.108434] nfs_statfs:
statfs error = 512
Oct 11 12:35:45 toaster01-mail kernel: [863447.372073] nfs_statfs:
statfs error = 512
Oct 11 12:46:44 toaster01-mail kernel: [864105.539485] nfs_statfs:
statfs error = 512
I've checked each network interface from eth1 (backend storage
interface) through to the fibre gig port that connects to the NetApp,
and there's no errors, collisions, etc. I've also read the 32Mb
Netapp filer report, and can't spot anything unusual there. Also
tried doing a strace for say listing files on /mailspool/mail8, but
just get ... "lstat("/mailspool/mail8", <unfinished ...>".
I've got one server currently out of service, that's experiencing
this "condition", so I can get any stats/output from this server if
there's anything anyone can think of?
Here's some details:-
Kernel:
2.6.15-27-amd64-k8
linux-image-2.6.15-27-amd64-server
2.6.15-27.50 Linux kernel image for version 2.6.15 on
Ser
NFS client:
nfs-common
1.0.7-3ubuntu2 NFS support files common to client and
serve
Filer mount options:
10.1.25.212:/vol/vol0/mail8-export /mailspool/mail8 nfs
rw,hard,intr,timeo=600,retrans=2,rsize=32768,wsize=32768 0 0
eth1 interface output:
eth1: negotiated 100baseTx-FD, link ok
eth1 Link encap:Ethernet HWaddr 00:17:08:50:78:41
inet addr:10.1.25.141 Bcast:10.1.25.255 Mask:255.255.255.0
inet6 addr: fe80::217:8ff:fe50:7841/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1468254714 errors:0 dropped:0 overruns:0 frame:0
TX packets:995375684 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1677649220750 (1.5 TiB) TX bytes:155688040628
(144.9 GiB)
Interrupt:193
nfsstat output:
Client rpc stats:
calls retrans authrefrsh
460431933 104 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 114645897 24% 2066807 0% 78704007 17% 151704157 32%
0 0%
read write create mkdir symlink mknod
53366969 11% 11421861 2% 6827536 1% 8280 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
10806338 2% 233 0% 2897780 0% 3458142 0% 5434425 1% 19085492 4%
fsstat fsinfo pathconf commit
68 0% 4 0% 0 0% 3933 0%
Any ideas, things to try?
Cheers,
Jaco
--
bje at serendipity.org.za
the faculty of making fortunate discoveries
More information about the ubuntu-server
mailing list