APPLIED: [PATCH][SRU][Bionic] scsi: libiscsi: Allow sd_shutdown on bad transport

Tue Jan 23 16:43:37 UTC 2018

On Fri, Jan 19, 2018 at 08:35:39PM +0000, Rafael David Tinoco wrote:
> BugLink: https://bugs.launchpad.net/bugs/1569925
> 
> If, for any reason, userland shuts down iscsi transport interfaces
> before proper logouts - like when logging in to LUNs manually, without
> logging out on server shutdown, or when automated scripts can't
> umount/logout from logged LUNs - kernel will hang forever on its
> sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
> still existent paths.
> 
> PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
>  #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
>  #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
>  #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
>  #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
>  #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
>  #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
>  #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
>  #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
>  #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
>  #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
> 
> This happens because iscsi_eh_cmd_timed_out(), the transport layer
> timeout helper, would tell the queue timeout function (scsi_times_out)
> to reset the request timer over and over, until the session state is
> back to logged in state. Unfortunately, during server shutdown, this
> might never happen again.
> 
> Other option would be "not to handle" the issue in the transport
> layer. That would trigger the error handler logic, which would also need
> the session state to be logged in again.
> 
> Best option, for such case, is to tell upper layers that the command was
> handled during the transport layer error handler helper, marking it as
> DID_NO_CONNECT, which will allow completion and inform about the
> problem.
> 
> After the session was marked as ISCSI_STATE_FAILED, due to the first
> timeout during the server shutdown phase, all subsequent cmds will fail
> to be queued, allowing upper logic to fail faster.
> 
> Signed-off-by: Rafael David Tinoco <rafael.tinoco at canonical.com>
> (cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee next-20180117)
> Reviewed-by: Lee Duncan <lduncan at suse.com>
> Signed-off-by: Martin K. Petersen <martin.petersen at oracle.com>

Applied to bionic/master-next and unstable/master, thanks!