ACK/cmnt: [SRU Xenial][PATCH 0/2] Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup)
Mauricio Faria de Oliveira
mfo at canonical.com
Tue Oct 16 18:08:48 UTC 2018
On Tue, Oct 16, 2018 at 1:42 PM Khaled Elmously
<khalid.elmously at canonical.com> wrote:
>
> On 2018-10-16 12:38:18 , Mauricio Faria de Oliveira wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1798110
> >
> > [Impact]
> >
> > * Detaching virtio-scsi disk in Xenial guest can cause
> > CPU soft lockup in guest (and take 100% CPU in host).
> >
> > * It may prevent further progress on other tasks that
> > depend on resources locked earlier in the SCSI target
> > removal stack, and/or impact other SCSI functionality.
> >
> > * The fix resolves a corner case in the requests counter
> > in the virtio SCSI target, which impacts a downstream
> > (SAUCE) patch in the virtio-scsi target removal handler
> > that depends on the requests counter value to be zero.
> >
> > [Test Case]
> >
> > * See LP #1798110 (this bug)'s comment #3 (too long for
> > this section -- synthetic case with GDB+QEMU) and
> > comment #4 (organic test case in cloud instance).
> >
> > [Regression Potential]
> >
> > * It seem low -- this only affects the SCSI command requeue
> > path with regards to the reference counter, which is only
> > used with real chance of problems in our downstream patch
> > (which is now passing this testcase).
> >
> > * The other less serious issue would be decrementing it to
> > a negative / < 0 value, which is not possible with this
> > driver logic (see commit message), because the reqs counter
> > is always incremented before calling virtscsi_queuecommand(),
> > where this decrement operation is inserted.
> >
> >
> > Mauricio Faria de Oliveira (2):
> > UBUNTU: SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI
> > command requeue
> > UBUNTU: SAUCE: (no-up) virtio-scsi: (cosmetic) reestablish
> > virtscsi_pick_vq_mq() signature
> >
> > drivers/scsi/virtio_scsi.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
>
Hi Khalid,
Thanks for reviewing.
> - Maybe I'm misunderstanding what you mean by 'no-up'. I assumed it means "no operation", as in, no change in the resulting binary (e.g., whitespace changes, comment changes, casting etc.). Both patches seem to be "op" changes
Hm, I meant 'no up-stream' (e.g., downstream/SAUCE patch that is not
expected to go upstream, as described in the StablePatchFormat wiki
page [3].
> - The second patch doesn't actually appear to be related to the 'reqs' counter or any related soft-lockups
It's related to the reqs counter "area", and you're absolutely right
about no relation to the soft lockup.
I should have mentioned the 2nd patch was more of a 'while still here'
patch, sorry.
It is related to the last applied virtio-scsi no-up patch, that
touches the reqs counter functionality as well.
> - It seems strange to me that you'd have a loop that repeatedly decrements a counter because you know it will be incremented again when about to retry. I would think it makes more sense to increment the counter only once when queuing the command and decrement only once on success or "giving up" - though I assume you did it this way for a reason?
I didn' t follow you on the loop scenario (I don't see a loop unless a
SCSI commands repeatedly fails to be kicked and gets requeued/resent
by the SCSI layer - is that it?)
I agree, and indeed, there's a reason -- upstream only used the reqs
counter in the .queuecommand path for queue steering
in two cases, 1) single-virtqueue, and 2) multi-virtqueue without
blk-mq -- so not all cases had it.
(and the first one makes it hard to move the increment call around, see below..)
Then we introduced the first reqs-related SAUCE patch [1] (spin-loop
in virtscsi_target_destroy() until reqs is zero),
which fixed a race, but introduced a regression for multi-virtqueue
with blk-mq (which didn't increment the counter).
The second reqs-related SAUCE patch [2] fixed that regression for
multi-virtqueue blk-mq,
but for that it ended up adding the increment in the blk-mq related
function as well.
(even there, when cases would do atomic_inc(), it would be hard to
move, as one case has atomic_inc_return()
which checks for the counter value right there.. so this complicates
moving that around.)
The patch [1] introduced another regression (this CPU softlockup),
which this patch now addresses.
I'd like to investigate later whether there's a way we can accomplish
the sync in commit [1]
without resorting to the reqs counter, as it fixed one issued and
introduced two for now.
This would allow to drop these SAUCE patches.
I'll likely have a chat w/ Jay about it this week, but since the fix
was obvious and conservative,
I imagined it would be better to have it fixed on top for now, and
then refactored/re-fixed later).
[1] f1f609d8015e UBUNTU: SAUCE: (no-up) virtio-scsi: Fix race in target free
[2] a6c038a533f3 UBUNTU: SAUCE: (no-up) virtio-scsi: Increment reqs counter.
> None of the above points are major enough for me to NAK this patchset, but I wanted to bring them to your attention (in case you missed something).
Sure, thanks for the careful review.
Hope this helps clarify things!
> Acked-by: Khalid Elmously <khalid.elmously at canonical.com>
>
[3] https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat
--
Mauricio Faria de Oliveira
More information about the kernel-team
mailing list