Kernel errors with nfs modules under ubuntu 10.04.3

carlopmart carlopmart at gmail.com
Wed Nov 23 15:56:33 UTC 2011


On 11/23/2011 04:43 PM, Stefan Bader wrote:
> On 23.11.2011 16:28, Serge Hallyn wrote:
>> Quoting carlopmart (carlopmart at gmail.com):
>>> Hi all,
>>>
>>>   This morning, my ubuntu server 10.04.3 had triggered this problem:
>>>
>>> [ 5464.526570] WARNING: at
>>> /build/buildd/linux-lts-backport-natty-2.6.38/kernel/softirq.c:159
>>> local_bh_enable+0x60/0x90()
>>> [ 5464.526574] Hardware name: VMware Virtual Platform
>>> [ 5464.526575] Modules linked in: nfs nfsd lockd fscache exportfs
>>> nfs_acl auth_rpcgss sunrpc ppdev vmw_balloon psmouse vmxnet3
>>> serio_raw i2c_piix4 shpchp parport_pc lp parport floppy vmw_pvscsi
>>> [ 5464.526599] Pid: 735, comm: lockd Not tainted 2.6.38-12-generic
>>> #51~lucid1-Ubuntu
>>> [ 5464.526602] Call Trace:
>>> [ 5464.526607]  [<c104c812>] ? warn_slowpath_common+0x72/0xa0
>>> [ 5464.526610]  [<c1052b40>] ? local_bh_enable+0x60/0x90
>>> [ 5464.526613]  [<c1052b40>] ? local_bh_enable+0x60/0x90
>>> [ 5464.526616]  [<c104c862>] ? warn_slowpath_null+0x22/0x30
>>> [ 5464.526619]  [<c1052b40>] ? local_bh_enable+0x60/0x90
>>> [ 5464.526624]  [<c1423a4f>] ? skb_copy_bits+0x10f/0x210
>>> [ 5464.526627]  [<c1423bc3>] ? __pskb_pull_tail+0x73/0x2a0
>>> [ 5464.526646]  [<f83b3fbf>] ? cache_check+0x4f/0x200 [sunrpc]
>>> [ 5464.526651]  [<f8065bee>] ? vmxnet3_parse_and_copy_hdr+0xce/0x170
>>> [vmxnet3]
>>> [ 5464.526655]  [<f806782b>] ? vmxnet3_tq_xmit+0x17b/0x480 [vmxnet3]
>>> [ 5464.526660]  [<c1506886>] ? _raw_spin_unlock_bh+0x16/0x20
>>> [ 5464.526665]  [<f8067b65>] ? vmxnet3_xmit_frame+0x35/0x40 [vmxnet3]
>>> [ 5464.526668]  [<c142d642>] ? dev_hard_start_xmit+0x202/0x460
>>> [ 5464.526696]  [<c14469f1>] ? sch_direct_xmit+0xb1/0x180
>>> [ 5464.526700]  [<c142eb2a>] ? dev_queue_xmit+0xfa/0x380
>>> [ 5464.526704]  [<c145e53f>] ? ip_finish_output+0x13f/0x300
>>> [ 5464.526707]  [<c145ea1f>] ? ip_output+0xbf/0xd0
>>> [ 5464.526710]  [<c145d8d0>] ? ip_local_out+0x20/0x30
>>> [ 5464.526714]  [<c145db0d>] ? ip_push_pending_frames+0x22d/0x390
>>> [ 5464.526717]  [<c147e03a>] ? udp_push_pending_frames+0x16a/0x3d0
>>> [ 5464.526721]  [<c147f5ce>] ? udp_sendpage+0xde/0x160
>>> [ 5464.526724]  [<c147f4f0>] ? udp_sendpage+0x0/0x160
>>> [ 5464.526731]  [<c1487475>] ? inet_sendpage+0x55/0xe0
>>> [ 5464.526734]  [<c1487420>] ? inet_sendpage+0x0/0xe0
>>> [ 5464.526738]  [<c141a6e3>] ? kernel_sendpage+0x43/0x70
>>> [ 5464.526750]  [<f83ad3b6>] ? svc_send_common+0x56/0x130 [sunrpc]
>>> [ 5464.526761]  [<f83ad500>] ? svc_sendto+0x70/0x1e0 [sunrpc]
>>> [ 5464.526765]  [<c127011d>] ? kref_put+0x2d/0x60
>>> [ 5464.526769]  [<c1421f3d>] ? __kfree_skb+0x3d/0x90
>>> [ 5464.526771]  [<c1421f3d>] ? __kfree_skb+0x3d/0x90
>>> [ 5464.526775]  [<c1425db2>] ? skb_free_datagram_locked+0x82/0xf0
>>> [ 5464.526786]  [<f83ad691>] ? svc_udp_sendto+0x21/0x50 [sunrpc]
>>> [ 5464.526797]  [<f83b7608>] ? svc_send+0x88/0xc0 [sunrpc]
>>> [ 5464.526808]  [<f83ab29f>] ? svc_process+0xff/0x140 [sunrpc]
>>> [ 5464.526814]  [<f80d7b52>] ? lockd+0xc2/0x1e0 [lockd]
>>> [ 5464.526818]  [<c1046c90>] ? default_wake_function+0x10/0x20
>>> [ 5464.526822]  [<c1035d28>] ? __wake_up_common+0x48/0x70
>>> [ 5464.526825]  [<c103b03e>] ? complete+0x4e/0x60
>>> [ 5464.526830]  [<f80d7a90>] ? lockd+0x0/0x1e0 [lockd]
>>> [ 5464.526834]  [<c106a1d4>] ? kthread+0x74/0x80
>>> [ 5464.526837]  [<c106a160>] ? kthread+0x0/0x80
>>> [ 5464.526841]  [<c10036be>] ? kernel_thread_helper+0x6/0x10
>>> [ 5464.526844] ---[ end trace 3ef368e5f078667d ]---
>>>
>>> It seems problems with nfs modules ... Is this a bug?? Do I need to
>>> change kernel version??
>>
>> Actually I think is it is a bug in vmxnet3.
>>
>> vmxnet3_parse_and_copy_hdr is called under spinlock by vmxnet3_tq_xmit.
>> vmxnet3_parse_and_copy_hdr calls pskb_may_pull, which calls __pskb_pull_tail,
>> which calls skb_copy_bits, which does local_bh_enable, which is what is doing
>>
>> 	WARN_ON_ONCE(in_irq());
>>
>> I'm cc:ing smb in the hopes he can say more :)
>>
>> thanks,
>> -serge
>
> Did not have the source code of vmxnet3. But it also sounds more like vmxnet3 to
> me. There is a spinunlock_bh within the whole sequence which looks a bit like it
> got interrupted. Maybe using the same lock in interrupt context and bottom
> halves but not protecting for the worst case (interrupts)...
>


Thanks. I have reconfigured virtual nics to e1000 instead of use vmxnet3 
.. and it seems ok.



-- 
CL Martinez
carlopmart {at} gmail {d0t} com




More information about the ubuntu-server mailing list