ACK/cmnt: [SRU Xenial][PATCH 0/2] Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup)

Khaled Elmously khalid.elmously at canonical.com
Wed Oct 17 14:59:24 UTC 2018


On 2018-10-16 15:08:48 , Mauricio Faria de Oliveira wrote:
> On Tue, Oct 16, 2018 at 1:42 PM Khaled Elmously
> <khalid.elmously at canonical.com> wrote:
> >
> > On 2018-10-16 12:38:18 , Mauricio Faria de Oliveira wrote:
> > > BugLink: https://bugs.launchpad.net/bugs/1798110
> > >
> > > [Impact]
> > >
> > >  * Detaching virtio-scsi disk in Xenial guest can cause
> > >    CPU soft lockup in guest (and take 100% CPU in host).
> > >
> > >  * It may prevent further progress on other tasks that
> > >    depend on resources locked earlier in the SCSI target
> > >    removal stack, and/or impact other SCSI functionality.
> > >
> > >  * The fix resolves a corner case in the requests counter
> > >    in the virtio SCSI target, which impacts a downstream
> > >    (SAUCE) patch in the virtio-scsi target removal handler
> > >    that depends on the requests counter value to be zero.
> > >
> > > [Test Case]
> > >
> > >  * See LP #1798110 (this bug)'s comment #3 (too long for
> > >    this section -- synthetic case with GDB+QEMU) and
> > >    comment #4 (organic test case in cloud instance).
> > >
> > > [Regression Potential]
> > >
> > >  * It seem low -- this only affects the SCSI command requeue
> > >    path with regards to the reference counter, which is only
> > >    used with real chance of problems in our downstream patch
> > >    (which is now passing this testcase).
> > >
> > >  * The other less serious issue would be decrementing it to
> > >    a negative / < 0 value, which is not possible with this
> > >    driver logic (see commit message), because the reqs counter
> > >    is always incremented before calling virtscsi_queuecommand(),
> > >    where this decrement operation is inserted.
> > >
> > >
> > > Mauricio Faria de Oliveira (2):
> > >   UBUNTU: SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI
> > >     command requeue
> > >   UBUNTU: SAUCE: (no-up) virtio-scsi: (cosmetic) reestablish
> > >     virtscsi_pick_vq_mq() signature
> > >
> > >  drivers/scsi/virtio_scsi.c | 10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> >
> 
> Hi Khalid,
> 
> Thanks for reviewing.
> 
> >  - Maybe I'm misunderstanding what you mean by 'no-up'. I assumed it means "no operation", as in, no change in the resulting binary (e.g., whitespace changes, comment changes, casting etc.). Both patches seem to be "op" changes
> 
> Hm, I meant 'no up-stream' (e.g., downstream/SAUCE patch that is not
> expected to go upstream, as described in the StablePatchFormat wiki
> page [3].
> 

I had no idea that that's what "no-up" means :) Thanks


> >  - The second patch doesn't actually appear to be related to the 'reqs' counter or any related soft-lockups
> 
> It's related to the reqs counter "area", and you're absolutely right
> about no relation to the soft lockup.
> 
> I should have mentioned the 2nd patch was more of a 'while still here'
> patch, sorry.
> It is related to the last applied virtio-scsi no-up patch, that
> touches the reqs counter functionality as well.
> 
> >  - It seems strange to me that you'd have a loop that repeatedly decrements a counter because you know it will be incremented again when about to retry. I would think it makes more sense to increment the counter only once when queuing the command and decrement only once on success or "giving up" - though I assume you did it this way for a reason?
> 
> I didn' t follow you on the loop scenario (I don't see a loop unless a
> SCSI commands repeatedly fails to be kicked and gets requeued/resent
> by the SCSI layer - is that it?)

Yep, that's what I meant.


> 
> I agree, and indeed, there's a reason -- upstream only used the reqs
> counter in the .queuecommand path for queue steering
> in two cases, 1) single-virtqueue,  and 2) multi-virtqueue without
> blk-mq -- so not all cases had it.
> (and the first one makes it hard to move the increment call around, see below..)
> 
> Then we introduced the first reqs-related SAUCE patch [1] (spin-loop
> in virtscsi_target_destroy() until reqs is zero),
> which fixed a race, but introduced a regression for multi-virtqueue
> with blk-mq (which didn't increment the counter).
> 
> The second reqs-related SAUCE patch [2] fixed that regression for
> multi-virtqueue blk-mq,
> but for that it ended up adding the increment in the blk-mq related
> function as well.
> (even there, when cases would do atomic_inc(), it would be hard to
> move, as one case has atomic_inc_return()
>  which checks for the counter value right there.. so this complicates
> moving that around.)
> 
> The patch [1] introduced another regression (this CPU softlockup),
> which this patch now addresses.
> 
> I'd like to investigate later whether there's a way we can accomplish
> the sync in commit [1]
> without resorting to the reqs counter, as it fixed one issued and
> introduced two for now.
> This would allow to drop these SAUCE patches.
> 
> I'll likely have a chat w/ Jay about it this week, but since the fix
> was obvious and conservative,
> I imagined it would be better to have it fixed on top for now, and
> then refactored/re-fixed later).

Fair enough - and thanks for clarifying in detail


> 
> [1] f1f609d8015e UBUNTU: SAUCE: (no-up) virtio-scsi: Fix race in target free
> [2] a6c038a533f3 UBUNTU: SAUCE: (no-up) virtio-scsi: Increment reqs counter.
> 
> 
> > None of the above points are major enough for me to NAK this patchset, but I wanted to bring them to your attention (in case you missed something).
> 
> Sure, thanks for the careful review.
> Hope this helps clarify things!
> 
> > Acked-by: Khalid Elmously <khalid.elmously at canonical.com>
> >
> 
> [3] https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat
> 
> -- 
> Mauricio Faria de Oliveira




More information about the kernel-team mailing list