[Bug 1815599] Re: multipath shows '#:#:#:#' for iscsi device after error injection
Christian Ehrhardt
1815599 at bugs.launchpad.net
Fri Mar 22 06:58:41 UTC 2019
@JFH - As just discussed you have mentioned that you have discussed the
existing tunables to Heinz Werner. Thanks for linking these here again
to make sure that Shixm knows.
Somewhat reminds me of bug 1540407 - but those changes are in Ubuntu already since 16.04.
Same for the even older bug 1374999.
Your kernel and open-iscsi versions indicate that you are on Bionic is that correct?
Mutlipath tools should be on
Your repro of:
> Run storage side error inject 'node reset' for SVC
isn't clear to me, neither do I have a Storage server I'm allowed to do error in ject nor the tools/UI to control it.
Instead I have tried the repro that are available to me as in:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1540407/comments/7
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1540407/comments/8
But all of them worked, see below.
Note that I never reached the loss of the path info in the faulty state '#:#:#:#' - even in faulty state it kew the path info.
@Shixm - since you have a setup that can reproduce this, can you try if
any of the latter releases (Cosmic/Disco) already resolve the issue that
you are seeing? Then we could try to hunt down which changes might have
resolved it for you instead of assuming this would need a totally new
change.
@Shixm - Any way to reproduce this without 'node reset' for SVC?
Finally this might as well need subject matter expertise - can we make sure that IBMs zfcp experts (Devs and maybe Thorsten who drove the old bugs) are subscribed on the mirrored bug 175431?
@JFH - do you think you can check that with the IBM team?
------------
Test results when retrying to trigger the issue:
Approach #1 gives me this (which isn't exactly the same state)
36005076306ffd6b60000000000002403 dm-1 IBM,2107900
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 1:0:0:1073954852 sdg 8:96 active ready running
|- 1:0:1:1073954852 sdn 8:208 active ready running
|- 0:0:0:1073954852 sdb 8:16 active faulty offline
`- 0:0:1:1073954852 sdj 8:144 active ready running
If I add it abck after this it works just fine again:
36005076306ffd6b60000000000002403 dm-1 IBM,2107900
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=enabled
|- 1:0:0:1073954852 sdg 8:96 active ready running
|- 1:0:1:1073954852 sdn 8:208 active ready running
|- 0:0:1:1073954852 sdj 8:144 active ready running
`- 0:0:0:1073954852 sdb 8:16 active ready running
The second approach (Disable, sleep, enable of the adapter), check zfcp config
lszdev -t zfcp-host
DEVICE TYPE zfcp
Description : SCSI-over-Fibre Channel (FCP) devices and SCSI devices
Modules : zfcp
Active : yes
Persistent : yes
ATTRIBUTE ACTIVE PERSISTENT
allow_lun_scan "1" "1"
datarouter "1" -
dbflevel "3" -
dbfsize "4" -
dif "0" -
no_auto_port_rescan "0" -
port_scan_backoff "500" -
port_scan_ratelimit "60000" -
queue_depth "32" -
Initially I get this (as expected):
36005076306ffd6b60000000000002403 dm-1 IBM,2107900
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=enabled
|- 1:0:0:1073954852 sdg 8:96 active ready running
|- 1:0:1:1073954852 sdn 8:208 active ready running
|- 0:0:1:1073954852 sdj 8:144 active i/o pending running
`- 0:0:0:1073954852 sdb 8:16 active i/o pending running
Then after a while it reaches the final fault state:
36005076306ffd6b60000000000002403 dm-1 IBM,2107900
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=enabled
|- 1:0:0:1073954852 sdg 8:96 active ready running
|- 1:0:1:1073954852 sdn 8:208 active ready running
|- 0:0:1:1073954852 sdj 8:144 failed faulty running
`- 0:0:0:1073954852 sdb 8:16 failed faulty running
After getting the paths back it immediately switches to:
36005076306ffd6b60000000000002403 dm-1 IBM,2107900
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=37 status=enabled
|- 1:0:0:1073954852 sdg 8:96 active ready running
|- 1:0:1:1073954852 sdn 8:208 active ready running
|- 0:0:1:1073954852 sdj 8:144 failed ready running
`- 0:0:0:1073954852 sdb 8:16 failed ready running
And after less than 20 seconds fully recovers to:
36005076306ffd6b60000000000002403 dm-1 IBM,2107900
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=enabled
|- 1:0:0:1073954852 sdg 8:96 active ready running
|- 1:0:1:1073954852 sdn 8:208 active ready running
|- 0:0:1:1073954852 sdj 8:144 active ready running
`- 0:0:0:1073954852 sdb 8:16 active ready running
** Changed in: multipath-tools (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1815599
Title:
multipath shows '#:#:#:#' for iscsi device after error injection
Status in Ubuntu on IBM z Systems:
New
Status in multipath-tools package in Ubuntu:
Incomplete
Bug description:
Problem Description:
After error injection(reset one node for storage), 1 of 4 luns show '####' for half paths
---uname output---
root at ilzlnx4:~# uname -a
Linux ilzlnx4 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:43:05 UTC 2018 s390x s390x s390x GNU/Linux
Machine Type = s390x
--iscsi initiator
root at ilzlnx4:~# dpkg -l | grep iscsi
ii open-iscsi 2.0.874-5ubuntu2.6 s390x iSCSI initiator tools
---Debugger---
A debugger is not configured
---Steps to Reproduce---
1 Mapping 4 luns via open-iscsi from SVC
2 Running IO on these luns
3 Run storage side error inject 'node reset' for SVC (about start at ?2019/02/11 05:14?)
4 half of one luns' path show '#:#:#:#' and never recovered without manual intervention
[2019/02/11 05:53:13] INFO send: multipath -ll | cat
[2019/02/11 05:53:29] INFO
3600507638085814a980000000000000a dm-3 IBM,2145
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 4:0:0:3 sdr 65:16 active ready running
| `- 6:0:0:3 sdu 65:64 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:3 sdh 8:112 active ready running
`- 2:0:0:3 sdl 8:176 active ready running
3600507638085814a9800000000000009 dm-4 IBM,2145
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:0:1 sdf 8:80 active ready running
| `- 2:0:0:1 sdj 8:144 active ready running
`-+- policy='service-time 0' prio=0 status=enabled
|- #:#:#:# sdo 8:224 active faulty running
`- #:#:#:# sds 65:32 active faulty running
3600507638085814a9800000000000008 dm-2 IBM,2145
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 4:0:0:2 sdq 65:0 active ready running
| `- 6:0:0:2 sdt 65:48 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 2:0:0:2 sdk 8:160 active ready running
`- 1:0:0:2 sdg 8:96 active ready running
3600507638085814a9800000000000006 dm-5 IBM,2145
size=10G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 4:0:0:0 sdm 8:192 active ready running
| `- 6:0:0:0 sdp 8:240 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
|- 1:0:0:0 sde 8:64 active ready running
`- 2:0:0:0 sdi 8:128 active ready running
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1815599/+subscriptions
More information about the foundations-bugs
mailing list