[Bug 932687] Re: nfs4_reclaim_locks: unhandled error crashes applications and creates high load
Andreas Heinlein
932687 at bugs.launchpad.net
Mon Nov 5 10:47:43 UTC 2012
Any news on this? We're experiencing exactly the same problems as described by Peter, except that the workaround doesn't work for us.
We have a lot of Ubuntu 10.04 LTS clients running with /home mounted through NFSv4, with a Debian 6.0 server. We also had a single test machine running 12.04 for several months now without problems. Last friday, I upgraded a second machine and the described problems began.
We also had a server crash on friday, where I'm not sure whether it is related. The server stopped with "Out of memory and no killable processes left." Apparently, it started killing processes to free up memory. The logs say it was due to imapd claiming more memory, but that could well be wrong. What we also see on the server is that two out of four rpciod kernel threads are stuck in the 'D' state, which apparently also causes a permanent load level of at least 2.0. It doesn't seem to have any real performance impact, though. These stuck threads are obviously resolved when you reboot the server, but return as soon as you fire up the 12.04 boxes.
We already had network cards configured by /etc/network/interfaces, so Peters workaround doesn't work for us. I have now removed the /home line from fstab and instead mount /home manually on these two boxes. The clientaddr field is now correct (was 0.0.0.0 before), and everything seems to work now.
That is still something that needs to be resolved quickly. I suspect there are some protocol incompatibilities here; we already went back on the server from kernel 3.2.0 (from Debian backports) to the official sqeeze kernel 2.6.32 because we had problems with ever increasing load on the server. Maybe going again to 3.2.0 on the server would help now, since both client and server would then be running the same kernel version again. But I cannot upgrade all boxes to 12.04 beforehand just to test. I will try and set up a test environment and post the results.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/932687
Title:
nfs4_reclaim_locks: unhandled error crashes applications and creates
high load
Status in “linux” package in Ubuntu:
Triaged
Status in “nfs-utils” package in Ubuntu:
Invalid
Bug description:
We tried to move our Natty clients to Oneiric but have a severe show-
stopper bug. Oneiric seems to have a problem with nfs. We use nfs for
our home-folders with strict permissions. nfs4-server is running
solaris.
As said no problem on natty, just with oneiric. Oneiric is running
fine some minutes, then in the dmesg we get such output:
[ 7778.934514] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7778.934521] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7869.899811] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7869.899818] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7869.938180] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7869.938184] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7869.950989] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7869.950993] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7869.977253] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7869.977258] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7870.364422] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7870.364429] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7870.594833] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7870.594839] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7870.652639] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7870.652644] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7870.678166] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7870.678171] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7880.217148] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7880.217155] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7880.277521] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7880.277527] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7880.374106] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7880.374113] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7880.440398] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
[ 7880.440404] nfs4_reclaim_open_state: Lock reclaim failed!
[ 7880.451121] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
[ 7880.451330] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
[ 7880.451520] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
[ 7880.451738] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
[ 7880.451921] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
[ 7880.452099] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
[ 7880.452279] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
[ 8160.252077] INFO: task claws-mail:23156 blocked for more than 120 seconds.
[ 8160.252085] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8160.252092] claws-mail D 0000000000000000 0 23156 1848 0x00000004
[ 8160.252103] ffff8800363e3a08 0000000000000046 ffff8800363e39a8 ffffffffa03275f0
[ 8160.252109] ffff8800363e3fd8 ffff8800363e3fd8 ffff8800363e3fd8 0000000000012a40
[ 8160.252115] ffff8800191edc80 ffff88003d7cdc80 ffff8800363e39e8 ffff88003fc132c0
[ 8160.252121] Call Trace:
[ 8160.252148] [<ffffffffa03275f0>] ? rpc_put_task+0x10/0x20 [sunrpc]
[ 8160.252158] [<ffffffff8110a180>] ? __lock_page+0x70/0x70
[ 8160.252164] [<ffffffff815eff1f>] schedule+0x3f/0x60
[ 8160.252168] [<ffffffff815effcf>] io_schedule+0x8f/0xd0
[ 8160.252173] [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
[ 8160.252177] [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
[ 8160.252182] [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
[ 8160.252189] [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
[ 8160.252194] [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
[ 8160.252216] [<ffffffffa03be1d0>] ? nfs_writedata_alloc+0x150/0x150 [nfs]
[ 8160.252233] [<ffffffffa03b89e0>] ? nfs_free_request+0x90/0x90 [nfs]
[ 8160.252243] [<ffffffff81115211>] ? do_writepages+0x21/0x40
[ 8160.252252] [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
[ 8160.252261] [<ffffffff8110bdc8>] filemap_write_and_wait_range+0x68/0x80
[ 8160.252271] [<ffffffff811940e2>] vfs_fsync_range+0x42/0xa0
[ 8160.252277] [<ffffffff811941ac>] vfs_fsync+0x1c/0x20
[ 8160.252295] [<ffffffffa03ad2e3>] nfs_file_flush+0x53/0x80 [nfs]
[ 8160.252301] [<ffffffff811661ff>] filp_close+0x3f/0x90
[ 8160.252307] [<ffffffff81060f3a>] put_files_struct.part.14+0x7a/0xe0
[ 8160.252312] [<ffffffff81062a08>] put_files_struct+0x18/0x20
[ 8160.252316] [<ffffffff81062ad4>] exit_files+0x54/0x70
[ 8160.252320] [<ffffffff81062fed>] do_exit+0x19d/0x440
[ 8160.252325] [<ffffffff8107186a>] ? __dequeue_signal+0x6a/0xb0
[ 8160.252330] [<ffffffff81063434>] do_group_exit+0x44/0xa0
[ 8160.252334] [<ffffffff8107406d>] get_signal_to_deliver+0x27d/0x3f0
[ 8160.252340] [<ffffffff8100a7e6>] do_signal+0x56/0x180
[ 8160.252348] [<ffffffff8104e94d>] ? set_next_entity+0x9d/0xb0
[ 8160.252352] [<ffffffff8104e5e9>] ? finish_task_switch+0x49/0xf0
[ 8160.252356] [<ffffffff815ef8c4>] ? __schedule+0x3d4/0x700
[ 8160.252361] [<ffffffff8100aad5>] do_notify_resume+0x65/0x80
[ 8160.252368] [<ffffffff815fa490>] int_signal+0x12/0x17
[ 8280.252073] INFO: task firefox:5030 blocked for more than 120 seconds.
[ 8280.252080] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8280.252087] firefox D ffffffff81805120 0 5030 1 0x00000004
[ 8280.252097] ffff88000bfc7a08 0000000000000046 ffff88001156adf0 ffff88000bfc7b98
[ 8280.252105] ffff88000bfc7fd8 ffff88000bfc7fd8 ffff88000bfc7fd8 0000000000012a40
[ 8280.252111] ffffffff81c0b020 ffff88003645c560 ffff88000bfc79e8 ffff88003fc132c0
[ 8280.252117] Call Trace:
[ 8280.252130] [<ffffffff8110a180>] ? __lock_page+0x70/0x70
[ 8280.252137] [<ffffffff815eff1f>] schedule+0x3f/0x60
[ 8280.252141] [<ffffffff815effcf>] io_schedule+0x8f/0xd0
[ 8280.252145] [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
[ 8280.252150] [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
[ 8280.252154] [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
[ 8280.252165] [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
[ 8280.252170] [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
[ 8280.252177] [<ffffffff81115211>] ? do_writepages+0x21/0x40
[ 8280.252181] [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
[ 8280.252186] [<ffffffff8110bdc8>] filemap_write_and_wait_range+0x68/0x80
[ 8280.252192] [<ffffffff811940e2>] vfs_fsync_range+0x42/0xa0
[ 8280.252196] [<ffffffff811941ac>] vfs_fsync+0x1c/0x20
[ 8280.252217] [<ffffffffa03ad2e3>] nfs_file_flush+0x53/0x80 [nfs]
[ 8280.252223] [<ffffffff811661ff>] filp_close+0x3f/0x90
[ 8280.252229] [<ffffffff81060f3a>] put_files_struct.part.14+0x7a/0xe0
[ 8280.252233] [<ffffffff81062a08>] put_files_struct+0x18/0x20
[ 8280.252237] [<ffffffff81062ad4>] exit_files+0x54/0x70
[ 8280.252243] [<ffffffff81062fed>] do_exit+0x19d/0x440
[ 8280.252251] [<ffffffff8107186a>] ? __dequeue_signal+0x6a/0xb0
[ 8280.252260] [<ffffffff81063434>] do_group_exit+0x44/0xa0
[ 8280.252268] [<ffffffff8107406d>] get_signal_to_deliver+0x27d/0x3f0
[ 8280.252277] [<ffffffff8100a7e6>] do_signal+0x56/0x180
[ 8280.252285] [<ffffffff811afe27>] ? fcntl_setlk+0x67/0x220
[ 8280.252294] [<ffffffff81178e42>] ? do_fcntl+0x1b2/0x340
[ 8280.252302] [<ffffffff8100aad5>] do_notify_resume+0x65/0x80
[ 8280.252311] [<ffffffff815fa490>] int_signal+0x12/0x17
[ 8280.252322] INFO: task claws-mail:23156 blocked for more than 120 seconds.
[ 8280.252328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8280.252333] claws-mail D 0000000000000000 0 23156 1848 0x00000004
[ 8280.252342] ffff8800363e3a08 0000000000000046 ffff8800363e39a8 ffffffffa03275f0
[ 8280.252350] ffff8800363e3fd8 ffff8800363e3fd8 ffff8800363e3fd8 0000000000012a40
[ 8280.252360] ffff8800191edc80 ffff88003d7cdc80 ffff8800363e39e8 ffff88003fc132c0
[ 8280.252368] Call Trace:
[ 8280.252395] [<ffffffffa03275f0>] ? rpc_put_task+0x10/0x20 [sunrpc]
[ 8280.252404] [<ffffffff8110a180>] ? __lock_page+0x70/0x70
[ 8280.252412] [<ffffffff815eff1f>] schedule+0x3f/0x60
[ 8280.252419] [<ffffffff815effcf>] io_schedule+0x8f/0xd0
[ 8280.252427] [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
[ 8280.252434] [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
[ 8280.252442] [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
[ 8280.252451] [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
[ 8280.252459] [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
[ 8280.252476] [<ffffffffa03be1d0>] ? nfs_writedata_alloc+0x150/0x150 [nfs]
[ 8280.252491] [<ffffffffa03b89e0>] ? nfs_free_request+0x90/0x90 [nfs]
[ 8280.252495] [<ffffffff81115211>] ? do_writepages+0x21/0x40
[ 8280.252500] [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
[ 8280.252505] [<ffffffff8110bdc8>] filemap_write_and_wait_range+0x68/0x80
[ 8280.252509] [<ffffffff811940e2>] vfs_fsync_range+0x42/0xa0
[ 8280.252513] [<ffffffff811941ac>] vfs_fsync+0x1c/0x20
[ 8280.252524] [<ffffffffa03ad2e3>] nfs_file_flush+0x53/0x80 [nfs]
[ 8280.252529] [<ffffffff811661ff>] filp_close+0x3f/0x90
[ 8280.252534] [<ffffffff81060f3a>] put_files_struct.part.14+0x7a/0xe0
[ 8280.252538] [<ffffffff81062a08>] put_files_struct+0x18/0x20
[ 8280.252542] [<ffffffff81062ad4>] exit_files+0x54/0x70
[ 8280.252546] [<ffffffff81062fed>] do_exit+0x19d/0x440
[ 8280.252550] [<ffffffff8107186a>] ? __dequeue_signal+0x6a/0xb0
[ 8280.252555] [<ffffffff81063434>] do_group_exit+0x44/0xa0
[ 8280.252561] [<ffffffff8107406d>] get_signal_to_deliver+0x27d/0x3f0
[ 8280.252570] [<ffffffff8100a7e6>] do_signal+0x56/0x180
[ 8280.252578] [<ffffffff8104e94d>] ? set_next_entity+0x9d/0xb0
[ 8280.252586] [<ffffffff8104e5e9>] ? finish_task_switch+0x49/0xf0
[ 8280.252591] [<ffffffff815ef8c4>] ? __schedule+0x3d4/0x700
[ 8280.252596] [<ffffffff8100aad5>] do_notify_resume+0x65/0x80
[ 8280.252601] [<ffffffff815fa490>] int_signal+0x12/0x17
[ 8280.252608] INFO: task firefox:15992 blocked for more than 120 seconds.
[ 8280.252613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8280.252618] firefox D ffffffff81805120 0 15992 1728 0x00000004
[ 8280.252627] ffff8800362b9c18 0000000000000086 ffff880011569d70 ffff8800362b9da8
[ 8280.252636] ffff8800362b9fd8 ffff8800362b9fd8 ffff8800362b9fd8 0000000000012a40
[ 8280.252645] ffff88003d698000 ffff88003d662e40 ffff8800362b9bf8 ffff88003fd132c0
[ 8280.252654] Call Trace:
[ 8280.252661] [<ffffffff8110a180>] ? __lock_page+0x70/0x70
[ 8280.252665] [<ffffffff815eff1f>] schedule+0x3f/0x60
[ 8280.252668] [<ffffffff815effcf>] io_schedule+0x8f/0xd0
[ 8280.252673] [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
[ 8280.252676] [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
[ 8280.252681] [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
[ 8280.252685] [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
[ 8280.252690] [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
[ 8280.252694] [<ffffffff81115211>] ? do_writepages+0x21/0x40
[ 8280.252699] [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
[ 8280.252703] [<ffffffff8110a54b>] filemap_fdatawait+0x2b/0x30
[ 8280.252707] [<ffffffff8110c7b4>] filemap_write_and_wait+0x44/0x60
[ 8280.252726] [<ffffffffa03b06d5>] nfs_getattr+0x105/0x120 [nfs]
[ 8280.252735] [<ffffffff8116c8fe>] vfs_getattr+0x4e/0x80
[ 8280.252741] [<ffffffff8116c988>] vfs_fstatat+0x58/0x70
[ 8280.252745] [<ffffffff8116c9db>] vfs_stat+0x1b/0x20
[ 8280.252748] [<ffffffff8116cb1a>] sys_newstat+0x1a/0x40
[ 8280.252752] [<ffffffff811695e5>] ? fput+0x25/0x30
[ 8280.252756] [<ffffffff8100b705>] ? math_state_restore+0x45/0x60
[ 8280.252762] [<ffffffff815f312e>] ? do_device_not_available+0xe/0x10
[ 8280.252769] [<ffffffff815fb17b>] ? device_not_available+0x1b/0x20
[ 8280.252778] [<ffffffff815fa1c2>] system_call_fastpath+0x16/0x1b
The related applications like firefox or claws-mail don't react anymore, killing them results in zombie-processes..
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices:
**** List of PLAYBACK Hardware Devices ****
card 0: Intel [HDA Intel], device 0: ALC260 Analog [ALC260 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
ArecordDevices:
**** List of CAPTURE Hardware Devices ****
card 0: Intel [HDA Intel], device 0: ALC260 Analog [ALC260 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: sebastian 1782 F.... xfce4-volumed
sebastian 1800 F.... pulseaudio
sebastian 1811 F.... xfce4-mixer-plu
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
Card hw:0 'Intel'/'HDA Intel at 0xf8000000 irq 41'
Mixer name : 'Realtek ALC260'
Components : 'HDA:10ec0260,17348601,00100400'
Controls : 18
Simple ctrls : 10
CurrentDmesg: Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
DistroRelease: Ubuntu 11.10
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
MachineType: FUJITSU SIEMENS D2151-A1
Package: nfs-utils
ProcEnviron:
PATH=(custom, user)
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.0.0-15-generic root=UUID=e35df589-f3cc-42a9-bff5-0c2cb1a7c0c2 ro quiet
ProcVersionSignature: Ubuntu 3.0.0-15.26-generic 3.0.13
RfKill: Error: [Errno 2] No such file or directory
Tags: oneiric
UdevDb: Error: [Errno 2] No such file or directory
Uname: Linux 3.0.0-15-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: sysadmin www-sg
WifiSyslog:
dmi.bios.date: 11/17/2005
dmi.bios.vendor: FUJITSU SIEMENS // Phoenix Technologies Ltd.
dmi.bios.version: 5.00 R1.07.2151.A1
dmi.board.name: D2151-A1
dmi.board.vendor: FUJITSU SIEMENS
dmi.board.version: S26361-D2151-A1
dmi.chassis.type: 6
dmi.chassis.vendor: FUJITSU SIEMENS
dmi.modalias: dmi:bvnFUJITSUSIEMENS//PhoenixTechnologiesLtd.:bvr5.00R1.07.2151.A1:bd11/17/2005:svnFUJITSUSIEMENS:pnD2151-A1:pvr:rvnFUJITSUSIEMENS:rnD2151-A1:rvrS26361-D2151-A1:cvnFUJITSUSIEMENS:ct6:cvr:
dmi.product.name: D2151-A1
dmi.sys.vendor: FUJITSU SIEMENS
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/932687/+subscriptions
More information about the foundations-bugs
mailing list