ACK/cmnt: [SRU Xenial][PATCH 0/2] Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup)

Mauricio Faria de Oliveira mfo at canonical.com
Tue Oct 16 18:08:48 UTC 2018


On Tue, Oct 16, 2018 at 1:42 PM Khaled Elmously
<khalid.elmously at canonical.com> wrote:
>
> On 2018-10-16 12:38:18 , Mauricio Faria de Oliveira wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1798110
> >
> > [Impact]
> >
> >  * Detaching virtio-scsi disk in Xenial guest can cause
> >    CPU soft lockup in guest (and take 100% CPU in host).
> >
> >  * It may prevent further progress on other tasks that
> >    depend on resources locked earlier in the SCSI target
> >    removal stack, and/or impact other SCSI functionality.
> >
> >  * The fix resolves a corner case in the requests counter
> >    in the virtio SCSI target, which impacts a downstream
> >    (SAUCE) patch in the virtio-scsi target removal handler
> >    that depends on the requests counter value to be zero.
> >
> > [Test Case]
> >
> >  * See LP #1798110 (this bug)'s comment #3 (too long for
> >    this section -- synthetic case with GDB+QEMU) and
> >    comment #4 (organic test case in cloud instance).
> >
> > [Regression Potential]
> >
> >  * It seem low -- this only affects the SCSI command requeue
> >    path with regards to the reference counter, which is only
> >    used with real chance of problems in our downstream patch
> >    (which is now passing this testcase).
> >
> >  * The other less serious issue would be decrementing it to
> >    a negative / < 0 value, which is not possible with this
> >    driver logic (see commit message), because the reqs counter
> >    is always incremented before calling virtscsi_queuecommand(),
> >    where this decrement operation is inserted.
> >
> >
> > Mauricio Faria de Oliveira (2):
> >   UBUNTU: SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI
> >     command requeue
> >   UBUNTU: SAUCE: (no-up) virtio-scsi: (cosmetic) reestablish
> >     virtscsi_pick_vq_mq() signature
> >
> >  drivers/scsi/virtio_scsi.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
>

Hi Khalid,

Thanks for reviewing.

>  - Maybe I'm misunderstanding what you mean by 'no-up'. I assumed it means "no operation", as in, no change in the resulting binary (e.g., whitespace changes, comment changes, casting etc.). Both patches seem to be "op" changes

Hm, I meant 'no up-stream' (e.g., downstream/SAUCE patch that is not
expected to go upstream, as described in the StablePatchFormat wiki
page [3].

>  - The second patch doesn't actually appear to be related to the 'reqs' counter or any related soft-lockups

It's related to the reqs counter "area", and you're absolutely right
about no relation to the soft lockup.

I should have mentioned the 2nd patch was more of a 'while still here'
patch, sorry.
It is related to the last applied virtio-scsi no-up patch, that
touches the reqs counter functionality as well.

>  - It seems strange to me that you'd have a loop that repeatedly decrements a counter because you know it will be incremented again when about to retry. I would think it makes more sense to increment the counter only once when queuing the command and decrement only once on success or "giving up" - though I assume you did it this way for a reason?

I didn' t follow you on the loop scenario (I don't see a loop unless a
SCSI commands repeatedly fails to be kicked and gets requeued/resent
by the SCSI layer - is that it?)

I agree, and indeed, there's a reason -- upstream only used the reqs
counter in the .queuecommand path for queue steering
in two cases, 1) single-virtqueue,  and 2) multi-virtqueue without
blk-mq -- so not all cases had it.
(and the first one makes it hard to move the increment call around, see below..)

Then we introduced the first reqs-related SAUCE patch [1] (spin-loop
in virtscsi_target_destroy() until reqs is zero),
which fixed a race, but introduced a regression for multi-virtqueue
with blk-mq (which didn't increment the counter).

The second reqs-related SAUCE patch [2] fixed that regression for
multi-virtqueue blk-mq,
but for that it ended up adding the increment in the blk-mq related
function as well.
(even there, when cases would do atomic_inc(), it would be hard to
move, as one case has atomic_inc_return()
 which checks for the counter value right there.. so this complicates
moving that around.)

The patch [1] introduced another regression (this CPU softlockup),
which this patch now addresses.

I'd like to investigate later whether there's a way we can accomplish
the sync in commit [1]
without resorting to the reqs counter, as it fixed one issued and
introduced two for now.
This would allow to drop these SAUCE patches.

I'll likely have a chat w/ Jay about it this week, but since the fix
was obvious and conservative,
I imagined it would be better to have it fixed on top for now, and
then refactored/re-fixed later).

[1] f1f609d8015e UBUNTU: SAUCE: (no-up) virtio-scsi: Fix race in target free
[2] a6c038a533f3 UBUNTU: SAUCE: (no-up) virtio-scsi: Increment reqs counter.


> None of the above points are major enough for me to NAK this patchset, but I wanted to bring them to your attention (in case you missed something).

Sure, thanks for the careful review.
Hope this helps clarify things!

> Acked-by: Khalid Elmously <khalid.elmously at canonical.com>
>

[3] https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat

-- 
Mauricio Faria de Oliveira




More information about the kernel-team mailing list