[Bug 561210] Re: Writing big files to NFS target causes system lock up
BlueBuntu
561210 at bugs.launchpad.net
Sun Sep 19 17:42:48 UTC 2010
A week ago I installed a fresh lucid 10.04 amd64 desktop onto a
workstation (Athlon II 240, 4GB ECC RAM, 1TB SATAII disk). Within a day
this machine locked up with no response to keyboard or mouse. I could
ping it yet I couldn't ssh to it, luckily Magic sysrq + REISUB was able
to sync the local disk, yet it wouldn't reboot. After looking at the
logs I noticed the nfs and kswap errors that eventually brought me to
this bug report (I have attached a portion of /var/log/messages showing
the similar errors).
At first I couldn't reliably reproduce the lockup, it just happened on
its own. However I was able to reproduce it in a few minutes by running
a simple loop which copied a cd image to and from an nfs mount then
diffing the contents. I later found that I could cause the lockup to
occur in under 10 seconds by adding a second instance of the copy loop
while also running memtester on half (2GB) of the RAM (which allocates
and mlocks the RAM). If I do this test from a VT I can watch the
kmesg/nfs dmesg logs you see at the top of this bug report being
displayed on the VT in real time.
I am using autofs to mount nfs using the following parameters:
server: rw,sync,no_root_squash,no_subtree_check
client: rw,hard,intr,tcp,fg,nfsvers=3,rsize=32768,wsize=32768
After reading this bug report and the ones from the kernel development I
got the impression that the problem was fixed in more recent kernels.
Luckily the kernel-ppa team has ported the maverick 2.6.35 kernel for
use in lucid. I used the following commands to try out the 2.6.35-21
maverick kernel on the lucid workstation. Unfortunately the lock up
happened even with the maverick kernel.
sudo add-apt-repository ppa:kernel-ppa/ppa
sudo apt-get update
sudo apt-get install linux-headers-2.6.35-21-generic linux-image-generic-lts-backport-maverick
sudo apt-get reboot
Apparently this nfs bug is present in not only 2.6.32 yet all the way up
to 2.6.35 (four different releases), which ultimately means anyone
expecting to use lucid or maverick with nfs will either have to live
with lock ups or hope that it eventually gets fixed.
Is there something unique to all of our systems that is masking this
from being found during normal regression testing, or perhaps I should
ask if NFS is even part of the regular testing?
** Attachment added: "/var/log/messages nfs and kswap errors"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/561210/+attachment/1609450/+files/log.txt
--
Writing big files to NFS target causes system lock up
https://bugs.launchpad.net/bugs/561210
You received this bug notification because you are a member of Kernel
Bugs, which is subscribed to linux in ubuntu.
More information about the kernel-bugs
mailing list