ACK/cmnt: [SRU Xenial][PATCH 0/2] Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup)
Khaled Elmously
khalid.elmously at canonical.com
Wed Oct 17 14:59:24 UTC 2018
On 2018-10-16 15:08:48 , Mauricio Faria de Oliveira wrote:
> On Tue, Oct 16, 2018 at 1:42 PM Khaled Elmously
> <khalid.elmously at canonical.com> wrote:
> >
> > On 2018-10-16 12:38:18 , Mauricio Faria de Oliveira wrote:
> > > BugLink: https://bugs.launchpad.net/bugs/1798110
> > >
> > > [Impact]
> > >
> > > * Detaching virtio-scsi disk in Xenial guest can cause
> > > CPU soft lockup in guest (and take 100% CPU in host).
> > >
> > > * It may prevent further progress on other tasks that
> > > depend on resources locked earlier in the SCSI target
> > > removal stack, and/or impact other SCSI functionality.
> > >
> > > * The fix resolves a corner case in the requests counter
> > > in the virtio SCSI target, which impacts a downstream
> > > (SAUCE) patch in the virtio-scsi target removal handler
> > > that depends on the requests counter value to be zero.
> > >
> > > [Test Case]
> > >
> > > * See LP #1798110 (this bug)'s comment #3 (too long for
> > > this section -- synthetic case with GDB+QEMU) and
> > > comment #4 (organic test case in cloud instance).
> > >
> > > [Regression Potential]
> > >
> > > * It seem low -- this only affects the SCSI command requeue
> > > path with regards to the reference counter, which is only
> > > used with real chance of problems in our downstream patch
> > > (which is now passing this testcase).
> > >
> > > * The other less serious issue would be decrementing it to
> > > a negative / < 0 value, which is not possible with this
> > > driver logic (see commit message), because the reqs counter
> > > is always incremented before calling virtscsi_queuecommand(),
> > > where this decrement operation is inserted.
> > >
> > >
> > > Mauricio Faria de Oliveira (2):
> > > UBUNTU: SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI
> > > command requeue
> > > UBUNTU: SAUCE: (no-up) virtio-scsi: (cosmetic) reestablish
> > > virtscsi_pick_vq_mq() signature
> > >
> > > drivers/scsi/virtio_scsi.c | 10 ++++++++--
> > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> >
>
> Hi Khalid,
>
> Thanks for reviewing.
>
> > - Maybe I'm misunderstanding what you mean by 'no-up'. I assumed it means "no operation", as in, no change in the resulting binary (e.g., whitespace changes, comment changes, casting etc.). Both patches seem to be "op" changes
>
> Hm, I meant 'no up-stream' (e.g., downstream/SAUCE patch that is not
> expected to go upstream, as described in the StablePatchFormat wiki
> page [3].
>
I had no idea that that's what "no-up" means :) Thanks
> > - The second patch doesn't actually appear to be related to the 'reqs' counter or any related soft-lockups
>
> It's related to the reqs counter "area", and you're absolutely right
> about no relation to the soft lockup.
>
> I should have mentioned the 2nd patch was more of a 'while still here'
> patch, sorry.
> It is related to the last applied virtio-scsi no-up patch, that
> touches the reqs counter functionality as well.
>
> > - It seems strange to me that you'd have a loop that repeatedly decrements a counter because you know it will be incremented again when about to retry. I would think it makes more sense to increment the counter only once when queuing the command and decrement only once on success or "giving up" - though I assume you did it this way for a reason?
>
> I didn' t follow you on the loop scenario (I don't see a loop unless a
> SCSI commands repeatedly fails to be kicked and gets requeued/resent
> by the SCSI layer - is that it?)
Yep, that's what I meant.
>
> I agree, and indeed, there's a reason -- upstream only used the reqs
> counter in the .queuecommand path for queue steering
> in two cases, 1) single-virtqueue, and 2) multi-virtqueue without
> blk-mq -- so not all cases had it.
> (and the first one makes it hard to move the increment call around, see below..)
>
> Then we introduced the first reqs-related SAUCE patch [1] (spin-loop
> in virtscsi_target_destroy() until reqs is zero),
> which fixed a race, but introduced a regression for multi-virtqueue
> with blk-mq (which didn't increment the counter).
>
> The second reqs-related SAUCE patch [2] fixed that regression for
> multi-virtqueue blk-mq,
> but for that it ended up adding the increment in the blk-mq related
> function as well.
> (even there, when cases would do atomic_inc(), it would be hard to
> move, as one case has atomic_inc_return()
> which checks for the counter value right there.. so this complicates
> moving that around.)
>
> The patch [1] introduced another regression (this CPU softlockup),
> which this patch now addresses.
>
> I'd like to investigate later whether there's a way we can accomplish
> the sync in commit [1]
> without resorting to the reqs counter, as it fixed one issued and
> introduced two for now.
> This would allow to drop these SAUCE patches.
>
> I'll likely have a chat w/ Jay about it this week, but since the fix
> was obvious and conservative,
> I imagined it would be better to have it fixed on top for now, and
> then refactored/re-fixed later).
Fair enough - and thanks for clarifying in detail
>
> [1] f1f609d8015e UBUNTU: SAUCE: (no-up) virtio-scsi: Fix race in target free
> [2] a6c038a533f3 UBUNTU: SAUCE: (no-up) virtio-scsi: Increment reqs counter.
>
>
> > None of the above points are major enough for me to NAK this patchset, but I wanted to bring them to your attention (in case you missed something).
>
> Sure, thanks for the careful review.
> Hope this helps clarify things!
>
> > Acked-by: Khalid Elmously <khalid.elmously at canonical.com>
> >
>
> [3] https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat
>
> --
> Mauricio Faria de Oliveira
More information about the kernel-team
mailing list