What's soft lockup?

Tue Jul 7 14:55:33 UTC 2020

On Tue, Jul 7, 2020 at 3:12 AM 孙世龙 sunshilong <sunshilong369 at gmail.com> wrote:
>
> Thank you for your detailed explanation.
>
> >> Here is the most related log:
> >> ul  3 10:24:35 yx kernel: [ 1240.166063] watchdog: BUG: soft lockup -
> >> CPU#0 stuck for 22s! [rt_cansend:1837]
> >> ...
> >> Jul  3 10:24:35 yx kernel: [ 1240.166257] Call Trace:
> >> Jul  3 10:24:35 yx kernel: [ 1240.166266]  _raw_spin_lock+0x20/0x30
> >> Jul  3 10:24:35 yx kernel: [ 1240.166275]  can_write+0x6c/0x2c0 [advcan]
> >> Jul  3 10:24:35 yx kernel: [ 1240.166281]  ? dequeue_signal+0xae/0x1a0
> >> Jul  3 10:24:35 yx kernel: [ 1240.166286]  ? recalc_sigpending+0x1b/0x50
> >> Jul  3 10:24:35 yx kernel: [ 1240.166291]  ? __set_task_blocked+0x3c/0xa0
> >> Jul  3 10:24:35 yx kernel: [ 1240.166297]  __vfs_write+0x3a/0x190
> >> Jul  3 10:24:35 yx kernel: [ 1240.166303]  ? apparmor_file_permission+0x1a/0x20
> >> Jul  3 10:24:35 yx kernel: [ 1240.166307]  ? security_file_permission+0x3b/0xc0
> >> Jul  3 10:24:35 yx kernel: [ 1240.166312]  vfs_write+0xb8/0x1b0
> >> Jul  3 10:24:35 yx kernel: [ 1240.166316]  ksys_write+0x5c/0xe0
> >> Jul  3 10:24:35 yx kernel: [ 1240.166321]  __x64_sys_write+0x1a/0x20
> >> Jul  3 10:24:35 yx kernel: [ 1240.166326]  do_syscall_64+0x87/0x250
> >> Jul  3 10:24:35 yx kernel: [ 1240.166331]
> >> entry_SYSCALL_64_after_hwframe+0x44/0xa9

> >This stack trace tells us that kernel has trying to write through VFS
> >layer and queued the write waiting for a lock. This lock is likely
> >taken by some other kernel code.

> I have two questions.
> First one:
> What're the relations beweeen "RIP: 0010:queued_spin_lock_slowpath+
> 0xd9/0x1a0" and " Call Trace: _raw_spin_lock+0x20/0x30"?
> Does __raw_spin_lock() invoke queued_spin_lock_slowpath()?
> I carefully read the related code snippet, but I don't think I can
> draw such a conclusion.
> What do you think about it?

1. ksys_write -> vfs_write(f.file)

2. f is a (struct fd) and f.file is a (struct file).

3. vfs_write -> __vfs_write(file,...) and it calls either:

file->f_op->write() = coming from the filesystem file operations func
implementation

or

file->f_op->write_iter() = coming also form the fs file operations
func implementation

from the "struct file_operations" we have:

..
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
..

You have to find which filesystem is in charge for the file being
written to here.

> Second one:
> Why does the kernel queue the write waiting for a lock? When such a
> queue operation should be taken?

A spinlock would wait forever. Their design is that they are supposed
to spin for a very tiny moment and the task holding one cannot sleep
(be scheduled). This would make other spinlock readers to wait up to
the max of the scheduler timequantum only (not much). The fact that
you're waiting for long time suggest that there is a bug in the
kernel.

> >Affected task is: PID 1837 and task is rt_cansend. Now.. it is
> >possible that the VFS write (a write to a filesystem) has NOT come
> >directly from this PID. Its part of the complexities of kernel. If
> >dirty pages is above a certain water mark is *may* use other tasks
> >%sys% time to flush dirty pages, for example.
> >So it may, or may not, come from this task.

> I almost understand what you mean. As I carefully read the related code snippet,
> I think the said "VFS write" is invoked(maybe, "triggered" is more
> suitable here,
> sorry for my poor English) while user application calls write function
> to send out
> some data through the can bus.

No..

xenomai-2.6.4+dfsg/src$ grep -ri "open(" * | wc -l
100

This seems like a *real* filesystem write and not a character device
handling writes to it, like you said (a dump analysis would have to
take place, but.. from the stack trace, this seems to be a real VFS
thing).

Specially if you consider all drivers from drivers/net/can or net/can
don't seem to implement "file_operations" to a char device (/dev/can
seems to be a network device dealing with socket operations instead).

> The stack is printed at an "almost" random time, but the context of
> the stacks is
> the same. So I think it must have some relation with some spinlock.
> As the stack indicates, I think it's the spinlock called by can_write function.
> What do you think about it?

Stack frame is telling you that.. the spinlock await comes from the
can_write for sure, which comes from __vfs_write() for sure, which
comes from the filesystem operations struct pointer for the filesystem
of the file being opened, for sure =).

> It only opens the can device node(i.e. something like /dev/can) as the source
> code of rt_cansend.c.

Nope it does not.

> >Your kernel is dead locked in VFS layer trying to write in the
> >filesystem by 2 PIDs:
> >and both PIDs are spinning in CPUs trying to acquire a spin lock that
> >has LIKELY been acquired by a task that has slept (scheduled out the
> >CPU) and may never come back to the CPUs to unlock the lock because of
> >other tasks are spinning (and the condition to release the lock won't
> >happen).
> Rt_cansend process consists of several threads(include 1835 and 1837).

For kernel, a task, a thread, a process.. all generic concepts. I use
"task"all the time because that is what kernel sees.

> >I hope I helped you somehow... talk to your kernel supporter, open a
> >support case or try other kernels to check if the issue happens. You
> >can always bisect kernel and find the offending commit IF you have a
> >100% accurate reproducer.

> I have the all source codes to build the kernel(the kernel is built myself).
> I can easily reproduce this error.
> But there is a huge problem, you see there are many commits for the
> Linux kernel,
> I don't know how to bisect kernel. Could you give me some suggestions?

Yes, that is the idea behind bisection...

You have the git tree of the kernel you're using.. find a commit that
builds and does not have the issue.. then find a commit that builds
and has the issue. Now you have the "good" commit and the "bad" commit
and you will compile your kernel on each step and run your tests.

If the kernel works through your test case.. you have a good step, if
your kernel does not work.. you have the bad step... you continue this
until you find the commit that introduced the issue (or solved it).

You will find good tutorials online teaching how to do this.. only
thing I have enough time to provide you right now is:

https://github.com/rafaeldtinoco/work/tree/master/sources/kernel

and

https://github.com/rafaeldtinoco/work/blob/master/sources/kernel/.gitbuild.sh

it may help you building the kernel in an ubuntu way (faster than
usual) considering you have a configuration file in hand.. just read
the scripts.

Good luck!