[PATCH][SRU][Bionic] scsi: libiscsi: Allow sd_shutdown on bad transport
Kleber Souza
kleber.souza at canonical.com
Tue Jan 23 15:27:29 UTC 2018
On 01/19/18 21:35, Rafael David Tinoco wrote:
> BugLink: https://bugs.launchpad.net/bugs/1569925
>
> If, for any reason, userland shuts down iscsi transport interfaces
> before proper logouts - like when logging in to LUNs manually, without
> logging out on server shutdown, or when automated scripts can't
> umount/logout from logged LUNs - kernel will hang forever on its
> sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
> still existent paths.
>
> PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
> #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
> #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
> #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
> #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
> #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
> #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
> #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
> #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
> #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
> #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
>
> This happens because iscsi_eh_cmd_timed_out(), the transport layer
> timeout helper, would tell the queue timeout function (scsi_times_out)
> to reset the request timer over and over, until the session state is
> back to logged in state. Unfortunately, during server shutdown, this
> might never happen again.
>
> Other option would be "not to handle" the issue in the transport
> layer. That would trigger the error handler logic, which would also need
> the session state to be logged in again.
>
> Best option, for such case, is to tell upper layers that the command was
> handled during the transport layer error handler helper, marking it as
> DID_NO_CONNECT, which will allow completion and inform about the
> problem.
>
> After the session was marked as ISCSI_STATE_FAILED, due to the first
> timeout during the server shutdown phase, all subsequent cmds will fail
> to be queued, allowing upper logic to fail faster.
>
> Signed-off-by: Rafael David Tinoco <rafael.tinoco at canonical.com>
> (cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee next-20180117)
Hi Rafael,
If the same patch can be applied cleanly for more than one series
there's no need to send a patch for each one of them. You can send a
single email just adding to the subject the prefixes for the series that
it applies to.
In this case it could be:
[PATCH][SRU][Bionic][Artful][Xenial][Trusty] scsi: libiscsi: ...
You can also use only the first letter of the series to have an
abbreviated form, e.g.:
[PATCH][SRU][B/A/X/T] scsi: libiscsi: ...
Thanks,
Kleber
> Reviewed-by: Lee Duncan <lduncan at suse.com>
> Signed-off-by: Martin K. Petersen <martin.petersen at oracle.com>
> ---
> drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 42381adf0769..f11c16500ff1 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
> */
> switch (session->state) {
> case ISCSI_STATE_FAILED:
> + /*
> + * cmds should fail during shutdown, if the session
> + * state is bad, allowing completion to happen
> + */
> + if (unlikely(system_state != SYSTEM_RUNNING)) {
> + reason = FAILURE_SESSION_FAILED;
> + sc->result = DID_NO_CONNECT << 16;
> + break;
> + }
> case ISCSI_STATE_IN_RECOVERY:
> reason = FAILURE_SESSION_IN_RECOVERY;
> sc->result = DID_IMM_RETRY << 16;
> @@ -1980,6 +1989,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
> }
>
> if (session->state != ISCSI_STATE_LOGGED_IN) {
> + /*
> + * During shutdown, if session is prematurely disconnected,
> + * recovery won't happen and there will be hung cmds. Not
> + * handling cmds would trigger EH, also bad in this case.
> + * Instead, handle cmd, allow completion to happen and let
> + * upper layer to deal with the result.
> + */
> + if (unlikely(system_state != SYSTEM_RUNNING)) {
> + sc->result = DID_NO_CONNECT << 16;
> + ISCSI_DBG_EH(session, "sc on shutdown, handled\n");
> + rc = BLK_EH_HANDLED;
> + goto done;
> + }
> /*
> * We are probably in the middle of iscsi recovery so let
> * that complete and handle the error.
> @@ -2084,7 +2106,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
> task->last_timeout = jiffies;
> spin_unlock(&session->frwd_lock);
> ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ?
> - "timer reset" : "nh");
> + "timer reset" : "shutdown or nh");
> return rc;
> }
> EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);
>
More information about the kernel-team
mailing list