What's soft lockup?

Robert Heller heller at deepsoft.com
Tue Jul 7 12:32:44 UTC 2020


At Tue, 7 Jul 2020 00:34:21 -0300 "Ubuntu user technical support,  not for general discussions" <ubuntu-users at lists.ubuntu.com> wrote:

> 
> > Here is the most related log:
> > ul  3 10:24:35 yx kernel: [ 1240.166063] watchdog: BUG: soft lockup -
> > CPU#0 stuck for 22s! [rt_cansend:1837]
> > ...
> > Jul  3 10:24:35 yx kernel: [ 1240.166257] Call Trace:
> > Jul  3 10:24:35 yx kernel: [ 1240.166266]  _raw_spin_lock+0x20/0x30
> > Jul  3 10:24:35 yx kernel: [ 1240.166275]  can_write+0x6c/0x2c0 [advcan]
> > Jul  3 10:24:35 yx kernel: [ 1240.166281]  ? dequeue_signal+0xae/0x1a0
> > Jul  3 10:24:35 yx kernel: [ 1240.166286]  ? recalc_sigpending+0x1b/0x50
> > Jul  3 10:24:35 yx kernel: [ 1240.166291]  ? __set_task_blocked+0x3c/0xa0
> > Jul  3 10:24:35 yx kernel: [ 1240.166297]  __vfs_write+0x3a/0x190
> > Jul  3 10:24:35 yx kernel: [ 1240.166303]  ? apparmor_file_permission+0x1a/0x20
> > Jul  3 10:24:35 yx kernel: [ 1240.166307]  ? security_file_permission+0x3b/0xc0
> > Jul  3 10:24:35 yx kernel: [ 1240.166312]  vfs_write+0xb8/0x1b0
> > Jul  3 10:24:35 yx kernel: [ 1240.166316]  ksys_write+0x5c/0xe0
> > Jul  3 10:24:35 yx kernel: [ 1240.166321]  __x64_sys_write+0x1a/0x20
> > Jul  3 10:24:35 yx kernel: [ 1240.166326]  do_syscall_64+0x87/0x250
> > Jul  3 10:24:35 yx kernel: [ 1240.166331]
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> This stack trace tells us that kernel has trying to write through VFS
> layer and queued the write waiting for a lock. This lock is likely
> taken by some other kernel code.
> 
> Affected task is: PID 1837 and task is rt_cansend. Now.. it is
> possible that the VFS write (a write to a filesystem) has NOT come
> directly from this PID. Its part of the complexities of kernel. If
> dirty pages is above a certain water mark is *may* use other tasks
> %sys% time to flush dirty pages, for example.
> 
> So it may, or may not, come from this task.
> 
> Note: I don't know much about xenomai and CAN messages.
> 
> > Here is the full log:
> > Jul  3 10:06:16 yx kernel: [  140.313856] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:06:59 yx kernel: [  183.323792] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:07:42 yx kernel: [  226.329465] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:08:24 yx kernel: [  268.362822] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:09:07 yx kernel: [  311.372488] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:09:50 yx kernel: [  354.377996] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:10:32 yx kernel: [  396.411726] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:11:15 yx kernel: [  439.421156] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:11:58 yx kernel: [  482.426522] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:12:40 yx kernel: [  524.460688] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:13:23 yx kernel: [  567.469857] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:14:06 yx kernel: [  610.475021] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:14:48 yx kernel: [  652.509597] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:15:31 yx kernel: [  695.518491] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:16:14 yx kernel: [  738.523551] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:16:55 yx kernel: [  779.558139] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:17:38 yx kernel: [  822.566773] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:18:21 yx kernel: [  865.571697] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:19:03 yx kernel: [  907.607049] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:19:46 yx kernel: [  950.615449] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:20:29 yx kernel: [  993.620196] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:21:11 yx kernel: [ 1035.655974] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:21:54 yx kernel: [ 1078.664116] CAN[0][0] RX: FIFO overrun
> > Jul  3 10:22:36 yx kernel: [ 1120.688506] CAN[0][0] RX: FIFO overrun
> 
> This is likely related to UART (serial controller) driver buffer and
> it is likely that you can ignore these errors.

*I* don't know of any x86-flavored (eg "PC" laptop/desktop) with CAN hardware,
other than USB/PCI addons (and the "virtual" CAN driver). Wondering: is the OP
doing something "interesting" with a BeagleBoard (ARM-based embeded Linux
mini) or some flavor of Raspbery Pi with a SPI connected CAN interface?

> 
> > Jul  3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup -
> > CPU#0 stuck for 22s! [rt_cansend:1837]
> > Jul  3 10:23:31 yx kernel: [ 1176.166066] Modules linked in: bnep
> > snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
> > nls_iso8859_1 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep
> > snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi intel_rapl
> > intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp
> > kvm_intel snd_seq punit_atom_debug crct10dif_pclmul crc32_pclmul
> > ghash_clmulni_intel snd_seq_device cryptd intel_cstate snd_timer
> > hci_uart snd lpc_ich advcan(OE) mei_txe btqca soundcore mei btbcm
> > btintel bluetooth ecdh_generic rfkill_gpio pwm_lpss_platform mac_hid
> > pwm_lpss parport_pc ppdev lp parport autofs4 i915 kvmgt vfio_mdev mdev
> > vfio_iommu_type1 vfio kvm irqbypass drm_kms_helper syscopyarea
> > sysfillrect sysimgblt fb_sys_fops igb drm dca ahci i2c_algo_bit
> > libahci video i2c_hid hid
> > Jul  3 10:23:31 yx kernel: [ 1176.166204] CPU: 0 PID: 1837 Comm:
> > rt_cansend Tainted: G           OE     4.19.84-solve-alc-failure #1
> > Jul  3 10:23:31 yx kernel: [ 1176.166209] I-pipe domain: Linux
> > Jul  3 10:23:31 yx kernel: [ 1176.166218] RIP:
> > 0010:queued_spin_lock_slowpath+0xd9/0x1a0
> 
> This is telling us that the instruction pointer (of the CPU) is a
> function called queued_spin_lock_slowpath and we're still waiting to
> acquire the lock (spinlock) that I mentioned before.
> 
> Note: as this is a spin lock, it's likely that the task has been
> consuming the CPU spinning in an infinite loop trying to get the lock,
> and not being scheduled in and out the CPU => this explains why your
> system is is bad state.
> 
> > Jul  3 10:23:31 yx kernel: [ 1176.166223] Code: 48 03 34 c5 00 67 37
> > 91 48 89 16 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
> > 48 85 f6 74 07 0f 0d 0e eb 02 f3 90 <8b> 07 66 85 c0 75 f7 41 89 c0 66
> > 45 31 c0 41 39 c8 0f 84 96 00 00
> > Jul  3 10:23:31 yx kernel: [ 1176.166226] RSP: 0018:ffffbe6f4c17bd08
> > EFLAGS: 00000202 ORIG_RAX: 00000000ffffff13
> > Jul  3 10:23:31 yx kernel: [ 1176.166231] RAX: 0000000000000000 RBX:
> > 0000000000000000 RCX: 0000000000000000
> > Jul  3 10:23:31 yx kernel: [ 1176.166234] RDX: 0000000000000000 RSI:
> > 0000000000000000 RDI: 0000000000000000
> > Jul  3 10:23:31 yx kernel: [ 1176.166236] RBP: ffffbe6f4c17bd08 R08:
> > 0000000000000000 R09: 0000000000000000
> > Jul  3 10:23:31 yx kernel: [ 1176.166239] R10: 0000000000000000 R11:
> > 0000000000000000 R12: 0000000000000000
> > Jul  3 10:23:31 yx kernel: [ 1176.166241] R13: 0000000000000000 R14:
> > 0000000000000000 R15: 0000000000000000
> > Jul  3 10:23:31 yx kernel: [ 1176.166244] FS:  00007ff04533c940(0000)
> > GS:ffff96003ba00000(0000) knlGS:0000000000000000
> > Jul  3 10:23:31 yx kernel: [ 1176.166247] CS:  0010 DS: 0000 ES: 0000
> > CR0: 0000000080050033
> > Jul  3 10:23:31 yx kernel: [ 1176.166250] CR2: 00007f799c003000 CR3:
> > 00000000ab0fe000 CR4: 00000000001006f0
> > Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
> > Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
> > Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
> > Jul  3 10:23:31 yx kernel: [ 1176.166276]  ? dequeue_signal+0xae/0x1a0
> > Jul  3 10:23:31 yx kernel: [ 1176.166281]  ? recalc_sigpending+0x1b/0x50
> > Jul  3 10:23:31 yx kernel: [ 1176.166286]  ? __set_task_blocked+0x3c/0xa0
> > Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190
> > Jul  3 10:23:31 yx kernel: [ 1176.166298]  ? apparmor_file_permission+0x1a/0x20
> > Jul  3 10:23:31 yx kernel: [ 1176.166302]  ? security_file_permission+0x3b/0xc0
> > Jul  3 10:23:31 yx kernel: [ 1176.166307]  vfs_write+0xb8/0x1b0
> > Jul  3 10:23:31 yx kernel: [ 1176.166312]  ksys_write+0x5c/0xe0
> > Jul  3 10:23:31 yx kernel: [ 1176.166316]  __x64_sys_write+0x1a/0x20
> > Jul  3 10:23:31 yx kernel: [ 1176.166321]  do_syscall_64+0x87/0x250
> > Jul  3 10:23:31 yx kernel: [ 1176.166326]
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > Jul  3 10:23:31 yx kernel: [ 1176.166330] RIP: 0033:0x7ff04885d4bd
> 
> The unwinding of the stack trace here seem not to have worked because
> we're running instructions from _raw_spin_lock() and it might be
> optimized or inlined.
> 
> Note: this is a very "high level" interpretation as you can see.
> 
> > Jul  3 10:23:31 yx kernel: [ 1176.166334] Code: bf 20 00 00 75 10 b8
> > 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae fc ff
> > ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 f7 fc ff
> > ff 48 89 d0 48 83 c4 08 48 3d 01
> > Jul  3 10:23:31 yx kernel: [ 1176.166337] RSP: 002b:00007ff04533ba70
> > EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > Jul  3 10:23:31 yx kernel: [ 1176.166342] RAX: ffffffffffffffda RBX:
> > 000000000153ca00 RCX: 00007ff04885d4bd
> > Jul  3 10:23:31 yx kernel: [ 1176.166344] RDX: 0000000000000001 RSI:
> > 00007ff04533bac0 RDI: 000000000000000e
> > Jul  3 10:23:31 yx kernel: [ 1176.166347] RBP: 00007ff04533bbb0 R08:
> > 0000000000000000 R09: 0000000000000000
> > Jul  3 10:23:31 yx kernel: [ 1176.166349] R10: 0000000000000000 R11:
> > 0000000000000246 R12: 00007ff0491783e0
> > Jul  3 10:23:31 yx kernel: [ 1176.166352] R13: 00007ffc5d6b9b1f R14:
> > 00007ff04533cc00 R15: 00000000004041b0
> > Jul  3 10:23:32 yx kernel: [ 1176.170065] watchdog: BUG: soft lockup -
> > CPU#3 stuck for 22s! [rt_cansend:1835]
> 
> Now we have rt_cansend again but pid 1835 =) and in another cpu, CPU 3.
> 
> So it *could be* that both rt_cansend triggered an issue in the VFS
> layer... but I'm not even sure what filesystem is being used and IF
> rt_cansend is writing to the filesystem OR is just being used to flush
> the dirty page cache for other processes.
> 
> 2 things: Either ANOTHER running process (not showed here because it
> is really executing), OR, worse, an already scheduled out process (BIG
> kernel bug in this case as tasks holding spinlocks can't sleep => thus
> can't schedule), are holding the spinlock (causing multiple calls to
> rt_cansend to spin in CPUs forever).
> 
> Like I said.. lots of assumptions as this is a "log-only" debugging.
> Xenomai is a realtime kernel attempt so.. maybe the realtime part of
> the xenomai enabled kernel triggered a real VFS dead lock (spinlock)
> scenario =).
> 
> > Jul  3 10:23:32 yx kernel: [ 1176.170075] Modules linked in: bnep
> > snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
> > nls_iso8859_1 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep
> > snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi intel_rapl
> > intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp
> > kvm_intel snd_seq punit_atom_debug crct10dif_pclmul crc32_pclmul
> > ghash_clmulni_intel snd_seq_device cryptd intel_cstate snd_timer
> > hci_uart snd lpc_ich advcan(OE) mei_txe btqca soundcore mei btbcm
> > btintel bluetooth ecdh_generic rfkill_gpio pwm_lpss_platform mac_hid
> > pwm_lpss parport_pc ppdev lp parport autofs4 i915 kvmgt vfio_mdev mdev
> > vfio_iommu_type1 vfio kvm irqbypass drm_kms_helper syscopyarea
> > sysfillrect sysimgblt fb_sys_fops igb drm dca ahci i2c_algo_bit
> > libahci video i2c_hid hid
> > Jul  3 10:23:32 yx kernel: [ 1176.170221] CPU: 3 PID: 1835 Comm:
> > rt_cansend Tainted: G           OEL    4.19.84-bros-slove-alc-failure
> > #1
> > Jul  3 10:23:32 yx kernel: [ 1176.170226] I-pipe domain: Linux
> > Jul  3 10:23:32 yx kernel: [ 1176.170236] RIP:
> > 0010:queued_spin_lock_slowpath+0x13f/0x1a0
> > Jul  3 10:23:32 yx kernel: [ 1176.170242] Code: 6b ff ff ff eb e4 ba
> > 00 01 00 00 8b 07 30 e4 09 d0 a9 00 01 ff ff 0f 85 fe fe ff ff 85 c0
> > 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89
> > 07 5d c3 f3 90 48 8b 32 48 85 f6
> > Jul  3 10:23:32 yx kernel: [ 1176.170245] RSP: 0018:ffffbe6f4c16bd08
> > EFLAGS: 00000202 ORIG_RAX: 00000000ffffff13
> > Jul  3 10:23:32 yx kernel: [ 1176.170250] RAX: 0000000000000000 RBX:
> > 0000000000000000 RCX: 0000000000000000
> > Jul  3 10:23:32 yx kernel: [ 1176.170252] RDX: 0000000000000000 RSI:
> > 0000000000000000 RDI: 0000000000000000
> > Jul  3 10:23:32 yx kernel: [ 1176.170255] RBP: ffffbe6f4c16bd08 R08:
> > 0000000000000000 R09: 0000000000000000
> > Jul  3 10:23:32 yx kernel: [ 1176.170257] R10: 0000000000000000 R11:
> > 0000000000000000 R12: 0000000000000000
> > Jul  3 10:23:32 yx kernel: [ 1176.170259] R13: 0000000000000000 R14:
> > 0000000000000000 R15: 0000000000000000
> > Jul  3 10:23:32 yx kernel: [ 1176.170263] FS:  00007ff045526940(0000)
> > GS:ffff96003bb80000(0000) knlGS:0000000000000000
> > Jul  3 10:23:32 yx kernel: [ 1176.170266] CS:  0010 DS: 0000 ES: 0000
> > CR0: 0000000080050033
> > Jul  3 10:23:32 yx kernel: [ 1176.170268] CR2: 000000c8203a5000 CR3:
> > 00000000ab0fe000 CR4: 00000000001006e0
> > Jul  3 10:23:32 yx kernel: [ 1176.170271] Call Trace:
> > Jul  3 10:23:32 yx kernel: [ 1176.170282]  _raw_spin_lock+0x20/0x30
> > Jul  3 10:23:32 yx kernel: [ 1176.170292]  can_write+0x6c/0x2c0 [advcan]
> > Jul  3 10:23:32 yx kernel: [ 1176.170298]  ? dequeue_signal+0xae/0x1a0
> > Jul  3 10:23:32 yx kernel: [ 1176.170303]  ? recalc_sigpending+0x1b/0x50
> > Jul  3 10:23:32 yx kernel: [ 1176.170308]  ? __set_task_blocked+0x3c/0xa0
> > Jul  3 10:23:32 yx kernel: [ 1176.170315]  __vfs_write+0x3a/0x190
> > Jul  3 10:23:32 yx kernel: [ 1176.170321]  ? apparmor_file_permission+0x1a/0x20
> > Jul  3 10:23:32 yx kernel: [ 1176.170325]  ? security_file_permission+0x3b/0xc0
> > Jul  3 10:23:32 yx kernel: [ 1176.170330]  vfs_write+0xb8/0x1b0
> > Jul  3 10:23:32 yx kernel: [ 1176.170335]  ksys_write+0x5c/0xe0
> > Jul  3 10:23:32 yx kernel: [ 1176.170339]  __x64_sys_write+0x1a/0x20
> > Jul  3 10:23:32 yx kernel: [ 1176.170345]  do_syscall_64+0x87/0x250
> > Jul  3 10:23:32 yx kernel: [ 1176.170349]
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > Jul  3 10:23:32 yx kernel: [ 1176.170354] RIP: 0033:0x7ff04885d4bd
> 
> Then we have more of the same AND... RCU code complaining that the CPU
> 0 was stalled. Then you had a NMI (non maskable interrupt) from the
> RCU watchdog so the kernel could print the stack trace:
> 
> > Jul  3 10:24:08 yx kernel: [ 1212.410060] rcu: INFO: rcu_sched
> > self-detected stall on CPU
> > Jul  3 10:24:08 yx kernel: [ 1212.410075] rcu: 0-....: (14999 ticks
> > this GP) idle=8de/1/0x4000000000000002 softirq=15317/15317 fqs=7496
> > Jul  3 10:24:08 yx kernel: [ 1212.410079] rcu: (t=15000 jiffies g=17909 q=591)
> > Jul  3 10:24:08 yx kernel: [ 1212.410087] NMI backtrace for cpu 0
> > Jul  3 10:24:08 yx kernel: [ 1212.410092] CPU: 0 PID: 1837 Comm:
> > rt_cansend Tainted: G           OEL    4.19.84-bros-slove-alc-failure
> > #1
> > Jul  3 10:24:08 yx kernel: [ 1212.410096] I-pipe domain: Linux
> > Jul  3 10:24:08 yx kernel: [ 1212.410099] Call Trace:
> > Jul  3 10:24:08 yx kernel: [ 1212.410103]  <IRQ>
> > Jul  3 10:24:08 yx kernel: [ 1212.410113]  dump_stack+0x9e/0xc8
> > Jul  3 10:24:08 yx kernel: [ 1212.410118]  nmi_cpu_backtrace+0x95/0xa0
> > Jul  3 10:24:08 yx kernel: [ 1212.410123]  ? lapic_can_unplug_cpu+0xb0/0xb0
> > Jul  3 10:24:08 yx kernel: [ 1212.410128]
> > nmi_trigger_cpumask_backtrace+0x90/0xd0
> > Jul  3 10:24:08 yx kernel: [ 1212.410132]
> > arch_trigger_cpumask_backtrace+0x19/0x20
> > Jul  3 10:24:08 yx kernel: [ 1212.410137]  rcu_dump_cpu_stacks+0x94/0xc8
> > Jul  3 10:24:08 yx kernel: [ 1212.410141]  rcu_check_callbacks+0x707/0x870
> > Jul  3 10:24:08 yx kernel: [ 1212.410148]  ? tick_sched_do_timer+0x60/0x60
> > Jul  3 10:24:08 yx kernel: [ 1212.410152]  update_process_times+0x3d/0x70
> > Jul  3 10:24:08 yx kernel: [ 1212.410157]  tick_sched_handle+0x29/0x60
> > Jul  3 10:24:08 yx kernel: [ 1212.410161]  tick_sched_timer+0x3c/0x80
> > Jul  3 10:24:08 yx kernel: [ 1212.410166]  __hrtimer_run_queues+0x106/0x270
> > Jul  3 10:24:08 yx kernel: [ 1212.410171]  hrtimer_interrupt+0x116/0x240
> > Jul  3 10:24:08 yx kernel: [ 1212.410177]  smp_apic_timer_interrupt+0x59/0x130
> > Jul  3 10:24:08 yx kernel: [ 1212.410182]  ?
> > smp_call_function_single_interrupt+0xd0/0xd0
> > Jul  3 10:24:08 yx kernel: [ 1212.410186]  __ipipe_do_IRQ+0x31/0x40
> > Jul  3 10:24:08 yx kernel: [ 1212.410192]  __ipipe_do_sync_stage+0x173/0x180
> > Jul  3 10:24:08 yx kernel: [ 1212.410196]  __ipipe_do_sync_pipeline+0x9b/0xa0
> > Jul  3 10:24:08 yx kernel: [ 1212.410201]  dispatch_irq_head+0xe6/0x110
> > Jul  3 10:24:08 yx kernel: [ 1212.410205]  __ipipe_dispatch_irq+0xd9/0x1c0
> > Jul  3 10:24:08 yx kernel: [ 1212.410210]  __ipipe_handle_irq+0x86/0x1e0
> > Jul  3 10:24:08 yx kernel: [ 1212.410215]  apic_timer_interrupt+0x12/0x40
> > Jul  3 10:24:08 yx kernel: [ 1212.410218]  </IRQ>
> > Jul  3 10:24:08 yx kernel: [ 1212.410223] RIP:
> > 0010:queued_spin_lock_slowpath+0xd9/0x1a0
> > Jul  3 10:24:08 yx kernel: [ 1212.410228] Code: 48 03 34 c5 00 67 37
> > 91 48 89 16 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
> > 48 85 f6 74 07 0f 0d 0e eb 02 f3 90 <8b> 07 66 85 c0 75 f7 41 89 c0 66
> > 45 31 c0 41 39 c8 0f 84 96 00 00
> > Jul  3 10:24:08 yx kernel: [ 1212.410231] RSP: 0018:ffffbe6f4c17bd08
> > EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
> > Jul  3 10:24:08 yx kernel: [ 1212.410236] RAX: 0000000000040101 RBX:
> > 0000000000000001 RCX: 0000000000040000
> > Jul  3 10:24:08 yx kernel: [ 1212.410238] RDX: ffff96003ba350c0 RSI:
> > 0000000000000000 RDI: ffffffffc069cc20
> > Jul  3 10:24:08 yx kernel: [ 1212.410241] RBP: ffffbe6f4c17bd08 R08:
> > 00007ff04533bac1 R09: ffff96001e06ac68
> > Jul  3 10:24:08 yx kernel: [ 1212.410243] R10: ffff95ff34a27f38 R11:
> > 0000000000000000 R12: 0000000000000000
> > Jul  3 10:24:08 yx kernel: [ 1212.410246] R13: 00007ff04533bac0 R14:
> > ffff95ff34a27f00 R15: 00007ff04533bac0
> > Jul  3 10:24:08 yx kernel: [ 1212.410255]  _raw_spin_lock+0x20/0x30
> > Jul  3 10:24:08 yx kernel: [ 1212.410262]  can_write+0x6c/0x2c0 [advcan]
> > Jul  3 10:24:08 yx kernel: [ 1212.410267]  ? dequeue_signal+0xae/0x1a0
> > Jul  3 10:24:08 yx kernel: [ 1212.410273]  ? recalc_sigpending+0x1b/0x50
> > Jul  3 10:24:08 yx kernel: [ 1212.410277]  ? __set_task_blocked+0x3c/0xa0
> > Jul  3 10:24:08 yx kernel: [ 1212.410283]  __vfs_write+0x3a/0x190
> > Jul  3 10:24:08 yx kernel: [ 1212.410289]  ? apparmor_file_permission+0x1a/0x20
> > Jul  3 10:24:08 yx kernel: [ 1212.410293]  ? security_file_permission+0x3b/0xc0
> > Jul  3 10:24:08 yx kernel: [ 1212.410298]  vfs_write+0xb8/0x1b0
> > Jul  3 10:24:08 yx kernel: [ 1212.410302]  ksys_write+0x5c/0xe0
> > Jul  3 10:24:08 yx kernel: [ 1212.410307]  __x64_sys_write+0x1a/0x20
> > Jul  3 10:24:08 yx kernel: [ 1212.410312]  do_syscall_64+0x87/0x250
> > Jul  3 10:24:08 yx kernel: [ 1212.410317]
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > Jul  3 10:24:08 yx kernel: [ 1212.410321] RIP: 0033:0x7ff04885d4bd
> > Jul  3 10:24:08 yx kernel: [ 1212.410325] Code: bf 20 00 00 75 10 b8
> > 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae fc ff
> > ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 f7 fc ff
> > ff 48 89 d0 48 83 c4 08 48 3d 01
> > Jul  3 10:24:08 yx kernel: [ 1212.410327] RSP: 002b:00007ff04533ba70
> > EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > Jul  3 10:24:08 yx kernel: [ 1212.410332] RAX: ffffffffffffffda RBX:
> > 000000000153ca00 RCX: 00007ff04885d4bd
> > Jul  3 10:24:08 yx kernel: [ 1212.410334] RDX: 0000000000000001 RSI:
> > 00007ff04533bac0 RDI: 000000000000000e
> > Jul  3 10:24:08 yx kernel: [ 1212.410337] RBP: 00007ff04533bbb0 R08:
> > 0000000000000000 R09: 0000000000000000
> > Jul  3 10:24:08 yx kernel: [ 1212.410339] R10: 0000000000000000 R11:
> > 0000000000000246 R12: 00007ff0491783e0
> > Jul  3 10:24:08 yx kernel: [ 1212.410342] R13: 00007ffc5d6b9b1f R14:
> > 00007ff04533cc00 R15: 00000000004041b0
> > Jul  3 10:24:08 yx kernel: [ 1212.410351] Sending NMI from CPU 0 to CPUs 3:
> > Jul  3 10:24:08 yx kernel: [ 1212.414061] NMI backtrace for cpu 3
> > Jul  3 10:24:08 yx kernel: [ 1212.414064] CPU: 3 PID: 1835 Comm:
> > rt_cansend Tainted: G           OEL    4.19.84-bros-slove-alc-failure
> > #1
> > Jul  3 10:24:08 yx kernel: [ 1212.414068] I-pipe domain: Linux
> > Jul  3 10:24:08 yx kernel: [ 1212.414070] RIP:
> > 0010:queued_spin_lock_slowpath+0x13f/0x1a0
> > Jul  3 10:24:08 yx kernel: [ 1212.414073] Code: 6b ff ff ff eb e4 ba
> > 00 01 00 00 8b 07 30 e4 09 d0 a9 00 01 ff ff 0f 85 fe fe ff ff 85 c0
> > 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89
> > 07 5d c3 f3 90 48 8b 32 48 85 f6
> > Jul  3 10:24:08 yx kernel: [ 1212.414075] RSP: 0018:ffffbe6f4c16bd08
> > EFLAGS: 00000202
> > Jul  3 10:24:08 yx kernel: [ 1212.414078] RAX: 0000000000040101 RBX:
> > 0000000000000001 RCX: ffffbe6f4c16bec8
> > Jul  3 10:24:08 yx kernel: [ 1212.414080] RDX: 0000000000000000 RSI:
> > 0000000000000000 RDI: ffffffffc069cc20
> > Jul  3 10:24:08 yx kernel: [ 1212.414082] RBP: ffffbe6f4c16bd08 R08:
> > 00007ff045525581 R09: ffff96001e06ac68
> > Jul  3 10:24:08 yx kernel: [ 1212.414084] R10: ffff95ff34a27f38 R11:
> > 0000000000000000 R12: 0000000000000000
> > Jul  3 10:24:08 yx kernel: [ 1212.414086] R13: 00007ff045525580 R14:
> > ffff95ff34a27f00 R15: 00007ff045525580
> > Jul  3 10:24:08 yx kernel: [ 1212.414088] FS:  00007ff045526940(0000)
> > GS:ffff96003bb80000(0000) knlGS:0000000000000000
> > Jul  3 10:24:08 yx kernel: [ 1212.414090] CS:  0010 DS: 0000 ES: 0000
> > CR0: 0000000080050033
> > Jul  3 10:24:08 yx kernel: [ 1212.414092] CR2: 000000c8203a5000 CR3:
> > 00000000ab0fe000 CR4: 00000000001006e0
> > Jul  3 10:24:08 yx kernel: [ 1212.414093] Call Trace:
> > Jul  3 10:24:08 yx kernel: [ 1212.414095]  _raw_spin_lock+0x20/0x30
> > Jul  3 10:24:08 yx kernel: [ 1212.414097]  can_write+0x6c/0x2c0 [advcan]
> > Jul  3 10:24:08 yx kernel: [ 1212.414099]  ? dequeue_signal+0xae/0x1a0
> > Jul  3 10:24:08 yx kernel: [ 1212.414100]  ? recalc_sigpending+0x1b/0x50
> > Jul  3 10:24:08 yx kernel: [ 1212.414102]  ? __set_task_blocked+0x3c/0xa0
> > Jul  3 10:24:08 yx kernel: [ 1212.414104]  __vfs_write+0x3a/0x190
> > Jul  3 10:24:08 yx kernel: [ 1212.414106]  ? apparmor_file_permission+0x1a/0x20
> > Jul  3 10:24:08 yx kernel: [ 1212.414107]  ? security_file_permission+0x3b/0xc0
> > Jul  3 10:24:08 yx kernel: [ 1212.414109]  vfs_write+0xb8/0x1b0
> > Jul  3 10:24:08 yx kernel: [ 1212.414111]  ksys_write+0x5c/0xe0
> > Jul  3 10:24:08 yx kernel: [ 1212.414113]  __x64_sys_write+0x1a/0x20
> > Jul  3 10:24:08 yx kernel: [ 1212.414114]  do_syscall_64+0x87/0x250
> > Jul  3 10:24:08 yx kernel: [ 1212.414116]
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > Jul  3 10:24:08 yx kernel: [ 1212.414118] RIP: 0033:0x7ff04885d4bd
> > Jul  3 10:24:08 yx kernel: [ 1212.414120] Code: bf 20 00 00 75 10 b8
> > 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae fc ff
> > ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 f7 fc ff
> > ff 48 89 d0 48 83 c4 08 48 3d 01
> > Jul  3 10:24:08 yx kernel: [ 1212.414122] RSP: 002b:00007ff045525530
> > EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > Jul  3 10:24:08 yx kernel: [ 1212.414126] RAX: ffffffffffffffda RBX:
> > 000000000000ffdc RCX: 00007ff04885d4bd
> > Jul  3 10:24:08 yx kernel: [ 1212.414128] RDX: 0000000000000001 RSI:
> > 00007ff045525580 RDI: 000000000000000e
> > Jul  3 10:24:08 yx kernel: [ 1212.414130] RBP: 00007ff045525670 R08:
> > 0000000000000000 R09: 0000000000000000
> > Jul  3 10:24:08 yx kernel: [ 1212.414132] R10: 00007ff030000078 R11:
> > 0000000000000246 R12: 00007ff045526878
> > Jul  3 10:24:08 yx kernel: [ 1212.414134] R13: 00007ff045525be0 R14:
> > 000000000000001c R15: 0000000000000007
> > Jul  3 10:24:35 yx kernel: [ 1240.166063] watchdog: BUG: soft lockup -
> > CPU#0 stuck for 22s! [rt_cansend:1837]
> 
> and more of the same.
> 
> Your kernel is dead locked in VFS layer trying to write in the
> filesystem by 2 PIDs:
> 
> rt_cansend: 1835 and 1837
> 
> and both PIDs are spinning in CPUs trying to acquire a spin lock that
> has LIKELY been acquired by a task that has slept (scheduled out the
> CPU) and may never come back to the CPUs to unlock the lock because of
> other tasks are spinning (and the condition to release the lock won't
> happen).
> 
> NOTE: This was a very fast and very inaccurate reading of your logs
> (for other debuggers around, so be nice in critics).
> 
> I hope I helped you somehow... talk to your kernel supporter, open a
> support case or try other kernels to check if the issue happens. You
> can always bisect kernel and find the offending commit IF you have a
> 100% accurate reproducer.
> 
> Best
> 
> -rafaeldtinoco
> 

-- 
Robert Heller             -- 978-544-6933 Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software        -- Custom Software Services
http://www.deepsoft.com/  -- Linux Administration Services
heller at deepsoft.com       -- Webhosting Services
                                                                                                   




More information about the ubuntu-users mailing list