ACK: [SRU Xenial][PATCH 1/2] UBUNTU: SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI command requeue
Stefan Bader
stefan.bader at canonical.com
Wed Oct 17 08:14:35 UTC 2018
On 16.10.18 17:38, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1798110
>
> The 'reqs' counter is incremented in the SCSI .queuecommand() path,
> right before virtscsi_queuecommand() is called, in either
> - virtscsi_queuecommand_single(), or
> - virtscsi_queuecommand_multi(), via virtscsi_pick_vq{_mq}().
>
> And it's decremented only in the SCSI command completion callback
> (after the command is successfully queued and completed by adapter):
> - virtscsi_complete_cmd().
>
> This allows for the counter to be incremented but _not_ decremented
> if virtscsi_queuecommand() gets an error to add/kick the command to
> the virtio ring (i.e., virtscsi_kick_cmd() fails with not 0 nor EIO).
>
> static virtscsi_queuecommand(...)
> {
> ...
> ret = virtscsi_kick_cmd(...)
> if (ret == -EIO) {
> ...
> virtscsi_complete_cmd(vscsi, cmd);
> ...
> } else if (ret != 0) {
> return SCSI_MLQUEUE_HOST_BUSY;
> }
> return 0;
> }
>
> In that case, the return code SCSI_MLQUEUE_HOST_BUSY causes the SCSI
> command to be requeued by the SCSI layer, which sends it again later
> in the .queuecommand() path -- incrementing the reqs counter _again_.
>
> This may happen many times for the same SCSI command, depending on
> the virtio ring condition/implementation, so the reqs counter will
> never return to zero (it's decremented only once in the completion
> callback). And it may happen for (m)any SCSI commands in this path.
>
> Unfortunately.. that causes a problem with a downstream/SAUCE patch
> for Xenial, which uses the reqs counter to sync with the completion
> callback: commit f1f609d8015e ("UBUNTU: SAUCE: (no-up) virtio-scsi:
> Fix race in target free"), and waits for the value to become zero.
>
> This problem plus that patch prevent the SCSI target removal from
> finishing, eventually causing a CPU soft lockup on another CPU that
> is waiting for some kernel resource that is/remains locked in the
> stack chain of this CPU.
>
> This has been verified 1) with a synthetic test case with QEMU+GDB
> that fakes the number of available elements in virtio ring for one
> time (harmless), so to force the SCSI command to be requeued, then
> uses QEMU monitor to remove the virtio-scsi target.
>
> _AND_ 2) with the test-case reported by the customer (a for-loop on
> a cloud instance that repeatedly mounts the virtio-scsi drive, copy
> data out of it, unmount it, then detach the virtio-scsi drive).
> (Here, the problem usually happens in the 1st or 2nd iteration, but
> with the patch it has run for 35 iterations without any problems).
>
> Upstream has done away with the reqs counter (originally used only
> to check if any requests were still active, for steering; not for
> our sync purposes). Instead of trying to find an alternative sync
> way for now let's just fix the behavior which we know is incorrect.
>
> Signed-off-by: Mauricio Faria de Oliveira <mfo at canonical.com>
> Tested-by: Mauricio Faria de Oliveira <mfo at canonical.com>
> Tested-by: David Coronel <david.coronel at canonical.com>
Acked-by: Stefan Bader <stefan.bader at canonical.com>
> ---
> drivers/scsi/virtio_scsi.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index b4a41d581021..572722e86bef 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -571,6 +571,10 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
> virtscsi_complete_cmd(vscsi, cmd);
> spin_unlock_irqrestore(&req_vq->vq_lock, flags);
> } else if (ret != 0) {
> + /* The SCSI command requeue will increment 'tgt->reqs' again. */
> + struct virtio_scsi_target_state *tgt =
> + scsi_target(sc->device)->hostdata;
> + atomic_dec(&tgt->reqs);
> return SCSI_MLQUEUE_HOST_BUSY;
> }
> return 0;
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20181017/a9ba902d/attachment.sig>
More information about the kernel-team
mailing list