ACK: [SRU Z/X][PATCH] scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION

Stefan Bader stefan.bader at canonical.com
Mon Aug 21 10:11:36 UTC 2017


On 18.08.2017 18:18, Mauricio Faria de Oliveira wrote:
> BugLink: http://bugs.launchpad.net/bugs/1682644
> 
> On a dual controller setup with multipath enabled, some MEDIUM ERRORs
> caused both paths to be failed, thus I/O got queued/blocked since the
> 'queue_if_no_path' feature is enabled by default on IPR controllers.
> 
> This example disabled 'queue_if_no_path' so the I/O failure is seen at
> the sg_dd program.  Notice that after the sg_dd test-case, both paths
> are in 'failed' state, and both path/priority groups are in 'enabled'
> state (not 'active') -- which would block I/O with 'queue_if_no_path'.
> 
>     # sg_dd if=/dev/dm-2 bs=4096 count=1 dio=1 verbose=4 blk_sgio=0
>     <...>
>     read(unix): count=4096, res=-1
>     sg_dd: reading, skip=0 : Input/output error
>     <...>
> 
>     # dmesg
>     [...] sd 2:2:16:0: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>     [...] sd 2:2:16:0: [sds] Sense Key : Medium Error [current]
>     [...] sd 2:2:16:0: [sds] Add. Sense: Unrecovered read error - recommend rewrite the data
>     [...] sd 2:2:16:0: [sds] CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
>     [...] blk_update_request: I/O error, dev sds, sector 0
>     [...] device-mapper: multipath: Failing path 65:32.
>     <...>
>     [...] device-mapper: multipath: Failing path 65:224.
> 
>     # multipath -l
>     1IBM_IPR-0_59C2AE0000001F80 dm-2 IBM     ,IPR-0   59C2AE00
>     size=5.2T features='0' hwhandler='1 alua' wp=rw
>     |-+- policy='service-time 0' prio=0 status=enabled
>     | `- 2:2:16:0 sds  65:32  failed undef running
>     `-+- policy='service-time 0' prio=0 status=enabled
>       `- 1:2:7:0  sdae 65:224 failed undef running
> 
> This is not the desired behavior. The dm-multipath explicitly checks
> for the MEDIUM ERROR case (and a few others) so not to fail the path
> (e.g., I/O to other sectors could potentially happen without problems).
> See dm-mpath.c :: do_end_io_bio() -> noretry_error() !->! fail_path().
> 
> The problem trace is:
> 
> 1) ipr_scsi_done()  // SENSE KEY/CHECK CONDITION detected, go to..
> 2) ipr_erp_start()  // ipr_is_gscsi() and masked_ioasc OK, go to..
> 3) ipr_gen_sense()  // masked_ioasc is IPR_IOASC_MED_DO_NOT_REALLOC,
>                     // so set DID_PASSTHROUGH.
> 
> 4) scsi_decide_disposition()  // check for DID_PASSTHROUGH and return
>                               // early on, faking a DID_OK.. *instead*
>                               // of reaching scsi_check_sense().
> 
>                               // Had it reached the latter, that would
>                               // set host_byte to DID_MEDIUM_ERROR.
> 
> 5) scsi_finish_command()
> 6) scsi_io_completion()
> 7) __scsi_error_from_host_byte()  // That would be converted to -ENODATA
> <...>
> 8) dm_softirq_done()
> 9) multipath_end_io()
> 10) do_end_io()
> 11) noretry_error()  // And that is checked in dm-mpath :: noretry_error()
>                      // which would cause fail_path() not to be called.
> 
> With this patch applied, the I/O is failed but the paths are not.  This
> multipath device continues accepting more I/O requests without blocking.
> (and notice the different host byte/driver byte handling per SCSI layer).
> 
>     # dmesg
>     [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
>     [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 40 00
>     [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
>     [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
>     [...] blk_update_request: critical medium error, dev sdaf, sector 0
>     [...] blk_update_request: critical medium error, dev dm-6, sector 0
>     [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
>     [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 10 00
>     [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
>     [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
>     [...] blk_update_request: critical medium error, dev sdaf, sector 0
>     [...] blk_update_request: critical medium error, dev dm-6, sector 0
>     [...] Buffer I/O error on dev dm-6, logical block 0, async page read
> 
>     # multipath -l 1IBM_IPR-0_59C2AE0000001F80
>     1IBM_IPR-0_59C2AE0000001F80 dm-6 IBM     ,IPR-0   59C2AE00
>     size=5.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
>     |-+- policy='service-time 0' prio=0 status=active
>     | `- 2:2:7:0  sdaf 65:240 active undef running
>     `-+- policy='service-time 0' prio=0 status=enabled
>       `- 1:2:7:0  sdh  8:112  active undef running
> 
> Signed-off-by: Mauricio Faria de Oliveira <mauricfo at linux.vnet.ibm.com>
> Acked-by: Brian King <brking at linux.vnet.ibm.com>
> Signed-off-by: Martin K. Petersen <martin.petersen at oracle.com>
> (cherry picked from commit 785a470496d8e0a32e3d39f376984eb2c98ca5b3)
> Signed-off-by: Mauricio Faria de Oliveira <mauricfo at linux.vnet.ibm.com>
Acked-by: Stefan Bader <stefan.bader at canonical.com>

> ---
>  drivers/scsi/ipr.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
> index 7a58128a0000..0d598ad24d8d 100644
> --- a/drivers/scsi/ipr.c
> +++ b/drivers/scsi/ipr.c
> @@ -6210,7 +6210,12 @@ static void ipr_erp_start(struct ipr_ioa_cfg *ioa_cfg,
>  		break;
>  	case IPR_IOASC_MED_DO_NOT_REALLOC: /* prevent retries */
>  	case IPR_IOASA_IR_DUAL_IOA_DISABLED:
> -		scsi_cmd->result |= (DID_PASSTHROUGH << 16);
> +		/*
> +		 * exception: do not set DID_PASSTHROUGH on CHECK CONDITION
> +		 * so SCSI mid-layer and upper layers handle it accordingly.
> +		 */
> +		if (scsi_cmd->result != SAM_STAT_CHECK_CONDITION)
> +			scsi_cmd->result |= (DID_PASSTHROUGH << 16);
>  		break;
>  	case IPR_IOASC_BUS_WAS_RESET:
>  	case IPR_IOASC_BUS_WAS_RESET_BY_OTHER:
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20170821/c58a19d0/attachment.sig>


More information about the kernel-team mailing list