[Bug 932687] Re: nfs4_reclaim_locks: unhandled error crashes applications and creates high load

Andreas Heinlein 932687 at bugs.launchpad.net
Mon Nov 5 10:47:43 UTC 2012


Any news on this? We're experiencing exactly the same problems as described by Peter, except that the workaround doesn't work for us.
We have a lot of Ubuntu 10.04 LTS clients running with /home mounted through NFSv4, with a Debian 6.0 server. We also had a single test machine running 12.04 for several months now without problems. Last friday, I upgraded a second machine and the described problems began. 
We also had a server crash on friday, where I'm not sure whether it is related. The server stopped with "Out of memory and no killable processes left." Apparently, it started killing processes to free up memory. The logs say it was due to imapd claiming more memory, but that could well be wrong. What we also see on the server is that two out of four rpciod kernel threads are stuck in the 'D' state, which apparently also causes a permanent load level of at least 2.0. It doesn't seem to have any real performance impact, though. These stuck threads are obviously resolved when you reboot the server, but return as soon as you fire up the 12.04 boxes.
We already had network cards configured by /etc/network/interfaces, so Peters workaround doesn't work for us. I have now removed the /home line from fstab and instead mount /home manually on these two boxes. The clientaddr field is now correct (was 0.0.0.0 before), and everything seems to work now.
That is still something that needs to be resolved quickly. I suspect there are some protocol incompatibilities here; we already went back on the server from kernel 3.2.0 (from Debian backports) to the official sqeeze kernel 2.6.32 because we had problems with ever increasing load on the server. Maybe going again to 3.2.0 on the server would help now, since both client and server would then be running the same kernel version again. But I cannot upgrade all boxes to 12.04 beforehand just to test. I will try and set up a test environment and post the results.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/932687

Title:
  nfs4_reclaim_locks: unhandled error crashes applications and creates
  high load

Status in “linux” package in Ubuntu:
  Triaged
Status in “nfs-utils” package in Ubuntu:
  Invalid

Bug description:
  We tried to move our Natty clients to Oneiric but have a severe show-
  stopper bug. Oneiric seems to have a problem with nfs. We use nfs for
  our home-folders with strict permissions. nfs4-server is running
  solaris.

  As said no problem on natty, just with oneiric. Oneiric is running
  fine some minutes, then in the dmesg we get such output:

  
  [ 7778.934514] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7778.934521] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7869.899811] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7869.899818] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7869.938180] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7869.938184] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7869.950989] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7869.950993] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7869.977253] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7869.977258] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7870.364422] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7870.364429] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7870.594833] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7870.594839] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7870.652639] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7870.652644] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7870.678166] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7870.678171] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7880.217148] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7880.217155] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7880.277521] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7880.277527] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7880.374106] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7880.374113] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7880.440398] nfs4_reclaim_locks: unhandled error -10024. Zeroing state
  [ 7880.440404] nfs4_reclaim_open_state: Lock reclaim failed!
  [ 7880.451121] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
  [ 7880.451330] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
  [ 7880.451520] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
  [ 7880.451738] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
  [ 7880.451921] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
  [ 7880.452099] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
  [ 7880.452279] nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
  [ 8160.252077] INFO: task claws-mail:23156 blocked for more than 120 seconds.
  [ 8160.252085] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 8160.252092] claws-mail      D 0000000000000000     0 23156   1848 0x00000004
  [ 8160.252103]  ffff8800363e3a08 0000000000000046 ffff8800363e39a8 ffffffffa03275f0
  [ 8160.252109]  ffff8800363e3fd8 ffff8800363e3fd8 ffff8800363e3fd8 0000000000012a40
  [ 8160.252115]  ffff8800191edc80 ffff88003d7cdc80 ffff8800363e39e8 ffff88003fc132c0
  [ 8160.252121] Call Trace:
  [ 8160.252148]  [<ffffffffa03275f0>] ? rpc_put_task+0x10/0x20 [sunrpc]
  [ 8160.252158]  [<ffffffff8110a180>] ? __lock_page+0x70/0x70
  [ 8160.252164]  [<ffffffff815eff1f>] schedule+0x3f/0x60
  [ 8160.252168]  [<ffffffff815effcf>] io_schedule+0x8f/0xd0
  [ 8160.252173]  [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
  [ 8160.252177]  [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
  [ 8160.252182]  [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
  [ 8160.252189]  [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
  [ 8160.252194]  [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
  [ 8160.252216]  [<ffffffffa03be1d0>] ? nfs_writedata_alloc+0x150/0x150 [nfs]
  [ 8160.252233]  [<ffffffffa03b89e0>] ? nfs_free_request+0x90/0x90 [nfs]
  [ 8160.252243]  [<ffffffff81115211>] ? do_writepages+0x21/0x40
  [ 8160.252252]  [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
  [ 8160.252261]  [<ffffffff8110bdc8>] filemap_write_and_wait_range+0x68/0x80
  [ 8160.252271]  [<ffffffff811940e2>] vfs_fsync_range+0x42/0xa0
  [ 8160.252277]  [<ffffffff811941ac>] vfs_fsync+0x1c/0x20
  [ 8160.252295]  [<ffffffffa03ad2e3>] nfs_file_flush+0x53/0x80 [nfs]
  [ 8160.252301]  [<ffffffff811661ff>] filp_close+0x3f/0x90
  [ 8160.252307]  [<ffffffff81060f3a>] put_files_struct.part.14+0x7a/0xe0
  [ 8160.252312]  [<ffffffff81062a08>] put_files_struct+0x18/0x20
  [ 8160.252316]  [<ffffffff81062ad4>] exit_files+0x54/0x70
  [ 8160.252320]  [<ffffffff81062fed>] do_exit+0x19d/0x440
  [ 8160.252325]  [<ffffffff8107186a>] ? __dequeue_signal+0x6a/0xb0
  [ 8160.252330]  [<ffffffff81063434>] do_group_exit+0x44/0xa0
  [ 8160.252334]  [<ffffffff8107406d>] get_signal_to_deliver+0x27d/0x3f0
  [ 8160.252340]  [<ffffffff8100a7e6>] do_signal+0x56/0x180
  [ 8160.252348]  [<ffffffff8104e94d>] ? set_next_entity+0x9d/0xb0
  [ 8160.252352]  [<ffffffff8104e5e9>] ? finish_task_switch+0x49/0xf0
  [ 8160.252356]  [<ffffffff815ef8c4>] ? __schedule+0x3d4/0x700
  [ 8160.252361]  [<ffffffff8100aad5>] do_notify_resume+0x65/0x80
  [ 8160.252368]  [<ffffffff815fa490>] int_signal+0x12/0x17
  [ 8280.252073] INFO: task firefox:5030 blocked for more than 120 seconds.
  [ 8280.252080] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 8280.252087] firefox         D ffffffff81805120     0  5030      1 0x00000004
  [ 8280.252097]  ffff88000bfc7a08 0000000000000046 ffff88001156adf0 ffff88000bfc7b98
  [ 8280.252105]  ffff88000bfc7fd8 ffff88000bfc7fd8 ffff88000bfc7fd8 0000000000012a40
  [ 8280.252111]  ffffffff81c0b020 ffff88003645c560 ffff88000bfc79e8 ffff88003fc132c0
  [ 8280.252117] Call Trace:
  [ 8280.252130]  [<ffffffff8110a180>] ? __lock_page+0x70/0x70
  [ 8280.252137]  [<ffffffff815eff1f>] schedule+0x3f/0x60
  [ 8280.252141]  [<ffffffff815effcf>] io_schedule+0x8f/0xd0
  [ 8280.252145]  [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
  [ 8280.252150]  [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
  [ 8280.252154]  [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
  [ 8280.252165]  [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
  [ 8280.252170]  [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
  [ 8280.252177]  [<ffffffff81115211>] ? do_writepages+0x21/0x40
  [ 8280.252181]  [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
  [ 8280.252186]  [<ffffffff8110bdc8>] filemap_write_and_wait_range+0x68/0x80
  [ 8280.252192]  [<ffffffff811940e2>] vfs_fsync_range+0x42/0xa0
  [ 8280.252196]  [<ffffffff811941ac>] vfs_fsync+0x1c/0x20
  [ 8280.252217]  [<ffffffffa03ad2e3>] nfs_file_flush+0x53/0x80 [nfs]
  [ 8280.252223]  [<ffffffff811661ff>] filp_close+0x3f/0x90
  [ 8280.252229]  [<ffffffff81060f3a>] put_files_struct.part.14+0x7a/0xe0
  [ 8280.252233]  [<ffffffff81062a08>] put_files_struct+0x18/0x20
  [ 8280.252237]  [<ffffffff81062ad4>] exit_files+0x54/0x70
  [ 8280.252243]  [<ffffffff81062fed>] do_exit+0x19d/0x440
  [ 8280.252251]  [<ffffffff8107186a>] ? __dequeue_signal+0x6a/0xb0
  [ 8280.252260]  [<ffffffff81063434>] do_group_exit+0x44/0xa0
  [ 8280.252268]  [<ffffffff8107406d>] get_signal_to_deliver+0x27d/0x3f0
  [ 8280.252277]  [<ffffffff8100a7e6>] do_signal+0x56/0x180
  [ 8280.252285]  [<ffffffff811afe27>] ? fcntl_setlk+0x67/0x220
  [ 8280.252294]  [<ffffffff81178e42>] ? do_fcntl+0x1b2/0x340
  [ 8280.252302]  [<ffffffff8100aad5>] do_notify_resume+0x65/0x80
  [ 8280.252311]  [<ffffffff815fa490>] int_signal+0x12/0x17
  [ 8280.252322] INFO: task claws-mail:23156 blocked for more than 120 seconds.
  [ 8280.252328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 8280.252333] claws-mail      D 0000000000000000     0 23156   1848 0x00000004
  [ 8280.252342]  ffff8800363e3a08 0000000000000046 ffff8800363e39a8 ffffffffa03275f0
  [ 8280.252350]  ffff8800363e3fd8 ffff8800363e3fd8 ffff8800363e3fd8 0000000000012a40
  [ 8280.252360]  ffff8800191edc80 ffff88003d7cdc80 ffff8800363e39e8 ffff88003fc132c0
  [ 8280.252368] Call Trace:
  [ 8280.252395]  [<ffffffffa03275f0>] ? rpc_put_task+0x10/0x20 [sunrpc]
  [ 8280.252404]  [<ffffffff8110a180>] ? __lock_page+0x70/0x70
  [ 8280.252412]  [<ffffffff815eff1f>] schedule+0x3f/0x60
  [ 8280.252419]  [<ffffffff815effcf>] io_schedule+0x8f/0xd0
  [ 8280.252427]  [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
  [ 8280.252434]  [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
  [ 8280.252442]  [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
  [ 8280.252451]  [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
  [ 8280.252459]  [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
  [ 8280.252476]  [<ffffffffa03be1d0>] ? nfs_writedata_alloc+0x150/0x150 [nfs]
  [ 8280.252491]  [<ffffffffa03b89e0>] ? nfs_free_request+0x90/0x90 [nfs]
  [ 8280.252495]  [<ffffffff81115211>] ? do_writepages+0x21/0x40
  [ 8280.252500]  [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
  [ 8280.252505]  [<ffffffff8110bdc8>] filemap_write_and_wait_range+0x68/0x80
  [ 8280.252509]  [<ffffffff811940e2>] vfs_fsync_range+0x42/0xa0
  [ 8280.252513]  [<ffffffff811941ac>] vfs_fsync+0x1c/0x20
  [ 8280.252524]  [<ffffffffa03ad2e3>] nfs_file_flush+0x53/0x80 [nfs]
  [ 8280.252529]  [<ffffffff811661ff>] filp_close+0x3f/0x90
  [ 8280.252534]  [<ffffffff81060f3a>] put_files_struct.part.14+0x7a/0xe0
  [ 8280.252538]  [<ffffffff81062a08>] put_files_struct+0x18/0x20
  [ 8280.252542]  [<ffffffff81062ad4>] exit_files+0x54/0x70
  [ 8280.252546]  [<ffffffff81062fed>] do_exit+0x19d/0x440
  [ 8280.252550]  [<ffffffff8107186a>] ? __dequeue_signal+0x6a/0xb0
  [ 8280.252555]  [<ffffffff81063434>] do_group_exit+0x44/0xa0
  [ 8280.252561]  [<ffffffff8107406d>] get_signal_to_deliver+0x27d/0x3f0
  [ 8280.252570]  [<ffffffff8100a7e6>] do_signal+0x56/0x180
  [ 8280.252578]  [<ffffffff8104e94d>] ? set_next_entity+0x9d/0xb0
  [ 8280.252586]  [<ffffffff8104e5e9>] ? finish_task_switch+0x49/0xf0
  [ 8280.252591]  [<ffffffff815ef8c4>] ? __schedule+0x3d4/0x700
  [ 8280.252596]  [<ffffffff8100aad5>] do_notify_resume+0x65/0x80
  [ 8280.252601]  [<ffffffff815fa490>] int_signal+0x12/0x17
  [ 8280.252608] INFO: task firefox:15992 blocked for more than 120 seconds.
  [ 8280.252613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 8280.252618] firefox         D ffffffff81805120     0 15992   1728 0x00000004
  [ 8280.252627]  ffff8800362b9c18 0000000000000086 ffff880011569d70 ffff8800362b9da8
  [ 8280.252636]  ffff8800362b9fd8 ffff8800362b9fd8 ffff8800362b9fd8 0000000000012a40
  [ 8280.252645]  ffff88003d698000 ffff88003d662e40 ffff8800362b9bf8 ffff88003fd132c0
  [ 8280.252654] Call Trace:
  [ 8280.252661]  [<ffffffff8110a180>] ? __lock_page+0x70/0x70
  [ 8280.252665]  [<ffffffff815eff1f>] schedule+0x3f/0x60
  [ 8280.252668]  [<ffffffff815effcf>] io_schedule+0x8f/0xd0
  [ 8280.252673]  [<ffffffff8110a18e>] sleep_on_page+0xe/0x20
  [ 8280.252676]  [<ffffffff815f07ef>] __wait_on_bit+0x5f/0x90
  [ 8280.252681]  [<ffffffff8110a378>] wait_on_page_bit+0x78/0x80
  [ 8280.252685]  [<ffffffff81081c50>] ? autoremove_wake_function+0x40/0x40
  [ 8280.252690]  [<ffffffff8110a48c>] filemap_fdatawait_range+0x10c/0x1a0
  [ 8280.252694]  [<ffffffff81115211>] ? do_writepages+0x21/0x40
  [ 8280.252699]  [<ffffffff8110bd5b>] ? __filemap_fdatawrite_range+0x5b/0x60
  [ 8280.252703]  [<ffffffff8110a54b>] filemap_fdatawait+0x2b/0x30
  [ 8280.252707]  [<ffffffff8110c7b4>] filemap_write_and_wait+0x44/0x60
  [ 8280.252726]  [<ffffffffa03b06d5>] nfs_getattr+0x105/0x120 [nfs]
  [ 8280.252735]  [<ffffffff8116c8fe>] vfs_getattr+0x4e/0x80
  [ 8280.252741]  [<ffffffff8116c988>] vfs_fstatat+0x58/0x70
  [ 8280.252745]  [<ffffffff8116c9db>] vfs_stat+0x1b/0x20
  [ 8280.252748]  [<ffffffff8116cb1a>] sys_newstat+0x1a/0x40
  [ 8280.252752]  [<ffffffff811695e5>] ? fput+0x25/0x30
  [ 8280.252756]  [<ffffffff8100b705>] ? math_state_restore+0x45/0x60
  [ 8280.252762]  [<ffffffff815f312e>] ? do_device_not_available+0xe/0x10
  [ 8280.252769]  [<ffffffff815fb17b>] ? device_not_available+0x1b/0x20
  [ 8280.252778]  [<ffffffff815fa1c2>] system_call_fastpath+0x16/0x1b


  The related applications like firefox or claws-mail don't react anymore, killing them results in zombie-processes..
  --- 
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
  AplayDevices:
   **** List of PLAYBACK Hardware Devices ****
   card 0: Intel [HDA Intel], device 0: ALC260 Analog [ALC260 Analog]
     Subdevices: 1/1
     Subdevice #0: subdevice #0
  ApportVersion: 1.23-0ubuntu4
  Architecture: amd64
  ArecordDevices:
   **** List of CAPTURE Hardware Devices ****
   card 0: Intel [HDA Intel], device 0: ALC260 Analog [ALC260 Analog]
     Subdevices: 1/1
     Subdevice #0: subdevice #0
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  sebastian   1782 F.... xfce4-volumed
                        sebastian   1800 F.... pulseaudio
                        sebastian   1811 F.... xfce4-mixer-plu
  CRDA: Error: [Errno 2] No such file or directory
  Card0.Amixer.info:
   Card hw:0 'Intel'/'HDA Intel at 0xf8000000 irq 41'
     Mixer name	: 'Realtek ALC260'
     Components	: 'HDA:10ec0260,17348601,00100400'
     Controls      : 18
     Simple ctrls  : 10
  CurrentDmesg: Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
  DistroRelease: Ubuntu 11.10
  IwConfig: Error: [Errno 2] No such file or directory
  Lsusb: Error: [Errno 2] No such file or directory
  MachineType: FUJITSU SIEMENS D2151-A1
  Package: nfs-utils
  ProcEnviron:
   PATH=(custom, user)
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.0.0-15-generic root=UUID=e35df589-f3cc-42a9-bff5-0c2cb1a7c0c2 ro quiet
  ProcVersionSignature: Ubuntu 3.0.0-15.26-generic 3.0.13
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  oneiric
  UdevDb: Error: [Errno 2] No such file or directory
  Uname: Linux 3.0.0-15-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: sysadmin www-sg
  WifiSyslog:
   
  dmi.bios.date: 11/17/2005
  dmi.bios.vendor: FUJITSU SIEMENS // Phoenix Technologies Ltd.
  dmi.bios.version: 5.00 R1.07.2151.A1
  dmi.board.name: D2151-A1
  dmi.board.vendor: FUJITSU SIEMENS
  dmi.board.version: S26361-D2151-A1
  dmi.chassis.type: 6
  dmi.chassis.vendor: FUJITSU SIEMENS
  dmi.modalias: dmi:bvnFUJITSUSIEMENS//PhoenixTechnologiesLtd.:bvr5.00R1.07.2151.A1:bd11/17/2005:svnFUJITSUSIEMENS:pnD2151-A1:pvr:rvnFUJITSUSIEMENS:rnD2151-A1:rvrS26361-D2151-A1:cvnFUJITSUSIEMENS:ct6:cvr:
  dmi.product.name: D2151-A1
  dmi.sys.vendor: FUJITSU SIEMENS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/932687/+subscriptions




More information about the foundations-bugs mailing list