Kernel deadlock on LTSP clients

Nikolaus Rath Nikolaus at rath.org
Sun Jul 11 17:53:24 BST 2010


Hi,

I have set up a Lucid diskless fat client using ltsp. The root
filesystem is aufs. Underlying the aufs is an rw tmpfs and a ro
squashfs, the later mounted from NBD.

The problems is that the fat clients work fine for a little while, but
then reproducibly freeze completely within a few hours after booting.
The last syslog messages that the server receives are:

Jul 10 14:11:38 beta kernel: [25560.688091] INFO: task cron:2278 blocked for more than 120 seconds.
Jul 10 14:11:38 beta kernel: [25560.688100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 10 14:11:38 beta kernel: [25560.688107] cron D 00006323 0 2278 1 0x00000000
Jul 10 14:11:38 beta kernel: [25560.688118] d549fa0c 00000086 00000080 00006323 00000000 c0847760 d5fb9c2c c0847760
Jul 10 14:11:38 beta kernel: [25560.688135] b0918eaf 0000170c c0847760 c0847760 d5fb9c2c c0847760 c0847760 d6345400
Jul 10 14:11:38 beta kernel: [25560.688151] b08e6975 0000170c d5fb9980 c1d08760 d5fb9980 d549fa58 d549fa1c c058a5ca
Jul 10 14:11:38 beta kernel: [25560.688168] Call Trace:
Jul 10 14:11:38 beta kernel: [25560.688185] [<c058a5ca>] io_schedule+0x3a/0x60
Jul 10 14:11:38 beta kernel: [25560.688194] [<c022d1f8>] sync_buffer+0x38/0x40
Jul 10 14:11:38 beta kernel: [25560.688201] [<c058ad6d>] __wait_on_bit+0x4d/0x70
Jul 10 14:11:38 beta kernel: [25560.688207] [<c022d1c0>] ? sync_buffer+0x0/0x40
Jul 10 14:11:38 beta kernel: [25560.688214] [<c022d1c0>] ? sync_buffer+0x0/0x40
Jul 10 14:11:38 beta kernel: [25560.688220] [<c058ae3b>] out_of_line_wait_on_bit+0xab/0xc0
Jul 10 14:11:38 beta kernel: [25560.688230] [<c0167850>] ? wake_bit_function+0x0/0x50
Jul 10 14:11:38 beta kernel: [25560.688237] [<c022d1be>] __wait_on_buffer+0x2e/0x30
Jul 10 14:11:38 beta kernel: [25560.688266] [<f80ec30b>] squashfs_read_data+0x30b/0x720 [squashfs]
Jul 10 14:11:38 beta kernel: [25560.688277] [<c0144f39>] ? load_balance_newidle+0x99/0x300
Jul 10 14:11:38 beta kernel: [25560.688290] [<f80ecb06>] squashfs_cache_get+0x1c6/0x2f0 [squashfs]
Jul 10 14:11:38 beta kernel: [25560.688304] [<f80ecd18>] squashfs_read_metadata+0x68/0xe0 [squashfs]
Jul 10 14:11:38 beta kernel: [25560.688317] [<f80ee488>] squashfs_read_inode+0x78/0x5b0 [squashfs]
Jul 10 14:11:38 beta kernel: [25560.688330] [<f80ef0e7>] ? squashfs_alloc_inode+0x17/0x30 [squashfs]
Jul 10 14:11:38 beta kernel: [25560.688340] [<c021cf9e>] ? inode_init_always+0xfe/0x190
Jul 10 14:11:38 beta kernel: [25560.688347] [<c021e015>] ? get_new_inode_fast+0xe5/0x110
Jul 10 14:11:38 beta kernel: [25560.688359] [<f80eea11>] squashfs_iget+0x51/0x80 [squashfs]
Jul 10 14:11:38 beta kernel: [25560.688371] [<f80eee73>] squashfs_lookup+0x293/0x320 [squashfs]
Jul 10 14:11:38 beta kernel: [25560.688384] [<c0212cb5>] __lookup_hash+0xc5/0x110
Jul 10 14:11:38 beta kernel: [25560.688390] [<c0212e0c>] lookup_hash+0x2c/0x30
Jul 10 14:11:38 beta kernel: [25560.688411] [<f82038ac>] vfsub_lookup_hash+0x1c/0x40 [aufs]
Jul 10 14:11:38 beta kernel: [25560.688429] [<f8209a1e>] au_lkup_one+0x9e/0xd0 [aufs]
Jul 10 14:11:38 beta kernel: [25560.688437] [<c058b577>] ? do_nanosleep+0x97/0xc0
Jul 10 14:11:38 beta kernel: [25560.688455] [<f8209ce6>] au_do_lookup+0x96/0x1f0 [aufs]
Jul 10 14:11:38 beta kernel: [25560.688476] [<f820a383>] au_lkup_dentry+0x193/0x270 [aufs]
Jul 10 14:11:38 beta kernel: [25560.688495] [<f82093ad>] ? do_ii_read_lock+0x2d/0x30 [aufs]
Jul 10 14:11:38 beta kernel: [25560.688541] [<f82102c5>] aufs_lookup+0xd5/0x1e0 [aufs]
Jul 10 14:11:38 beta kernel: [25560.688550] [<c058c32d>] ? _spin_lock+0xd/0x10
Jul 10 14:11:38 beta kernel: [25560.688563] [<c021b84b>] ? d_alloc+0x13b/0x190
Jul 10 14:11:38 beta kernel: [25560.688578] [<c0211177>] real_lookup+0xb7/0x110
Jul 10 14:11:38 beta kernel: [25560.688590] [<c0212bc5>] do_lookup+0x95/0xc0
Jul 10 14:11:38 beta kernel: [25560.688602] [<c02134b3>] __link_path_walk+0x603/0xca0
Jul 10 14:11:38 beta kernel: [25560.688616] [<c0101c1d>] ? __switch_to+0xcd/0x180
Jul 10 14:11:38 beta kernel: [25560.688628] [<c0213d64>] path_walk+0x54/0xc0
Jul 10 14:11:38 beta kernel: [25560.688640] [<c0213ee9>] do_path_lookup+0x59/0x90
Jul 10 14:11:38 beta kernel: [25560.688652] [<c0214a31>] user_path_at+0x41/0x80
Jul 10 14:11:38 beta kernel: [25560.688666] [<c016bd46>] ? hrtimer_try_to_cancel+0x36/0xb0
Jul 10 14:11:38 beta kernel: [25560.688679] [<c058b577>] ? do_nanosleep+0x97/0xc0
Jul 10 14:11:38 beta kernel: [25560.688692] [<c016be88>] ? hrtimer_nanosleep+0xa8/0x140
Jul 10 14:11:38 beta kernel: [25560.688705] [<c020c89a>] vfs_fstatat+0x3a/0x70
Jul 10 14:11:38 beta kernel: [25560.688717] [<c020c9f0>] vfs_stat+0x20/0x30
Jul 10 14:11:38 beta kernel: [25560.688729] [<c020ca19>] sys_stat64+0x19/0x30
Jul 10 14:11:38 beta kernel: [25560.688743] [<c016ad50>] ? hrtimer_wakeup+0x0/0x30
Jul 10 14:11:38 beta kernel: [25560.688755] [<c016bd06>] ? hrtimer_start_range_ns+0x26/0x30
Jul 10 14:11:38 beta kernel: [25560.688769] [<c015182e>] ? sys_time+0x1e/0x60
Jul 10 14:11:38 beta kernel: [25560.688781] [<c01033ec>] syscall_call+0x7/0xb

These messages come for different tasks (not just cron) and to me the
call traces look identical (but I can also attach a full set of log
messages).

The fat client image was generated by the karmic ltsp tools and then
upgraded to Lucid in the chroot.


Anyone able to help?


Best,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C




More information about the edubuntu-users mailing list