[Bug 1435706] Re: DevLossTO, FastIoFailTO settings do not match multipath.conf expected values
Tore Anderson
tore at fud.no
Sat Aug 29 08:46:48 UTC 2015
Ok, so I did some more testing. It appears that the problem isn't
specific to the dev_loss_tmo and fast_io_fail_tmo setting. This is
evidenced by the terminal log below. In multipath.conf (which we know
for certain is being read, as the created multipath map gets the correct
alias), I instruct it to use the ALUA hardware handler for all devices.
However, for some reason, this is ignored, and the EMC hardware handler
is used instead:
=====
root at ucstest-osl2:~# cat /etc/multipath.conf
devices {
device {
vendor ".*"
product ".*"
hardware_handler "1 alua"
}
}
multipaths {
multipath {
wwid 3600601603a71320022967e0a1f38e411
alias bootvolume
}
}
root at ucstest-osl2:~# multipath -v 2
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 emc' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
|- 0:0:1:0 sdb 8:16 undef ready running
`- 1:0:0:0 sdc 8:32 undef ready running
=====
This does *NOT* happen on RHEL-based distros - on those, changing the
hardware_handler in multipath.conf in this way works as expected.
So why does it use the EMC hardware_handler? Well, there's a built-in
default device section that matches the array in question. So this
appears to override my user-specified config from multipath.conf:
=====
root at ucstest-osl2:~# multipathd -k'show config' | grep -B10 -A4 '1 emc'
device {
vendor "DGC"
product ".*"
product_blacklist "LUNZ"
path_grouping_policy group_by_prio
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector round-robin 0
path_checker emc_clariion
checker emc_clariion
features "1 queue_if_no_path"
hardware_handler "1 emc"
prio emc
failback immediate
no_path_retry 60
}
=====
If I copy the entire default device config into /etc/multipath.conf and
only change the hardware_handler setting, then it starts working:
=====
root at ucstest-osl2:~# cat /etc/multipath.conf
devices {
device {
vendor "DGC"
product ".*"
product_blacklist "LUNZ"
path_grouping_policy group_by_prio
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "round-robin 0"
path_checker emc_clariion
checker emc_clariion
features "1 queue_if_no_path"
hardware_handler "1 alua"
prio emc
failback immediate
no_path_retry 60
}
}
multipaths {
multipath {
wwid 3600601603a71320022967e0a1f38e411
alias bootvolume
}
}
root at ucstest-osl2:~# multipath -v 2
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
|- 0:0:1:0 sdb 8:16 undef ready running
`- 1:0:0:0 sdc 8:32 undef ready running
=====
It would appear that for some reason, in order to override default
device settings in Ubuntu there must be an *exact* string match between
the user-supplied «vendor» and «product» settings. If I change e.g.
«product» in multipath.conf to ".*.*", then it starts using the built-in
defaults again, ignoring multipath.conf. I consider this behaviour very
dangerous - consider that if the admin has a working config (due to
exact matching vendor/product settings), and then the package gets
updated and extends the built-in defaults to incorporate some new model
matching the same profile/settings). At this point the admin's working
config will stop being used, possibly causing disruptive problems. I
therefore strongly suggest you figure out why it behaves differently in
Ubuntu and RHEL, and adopt the RHEL behaviour which really is the only
sensible one.
In any case, now that I know how to ensure my multipath.conf settings
are being used, I re-tried adding dev_loss_tmo and fast_io_fail_tmo, but
it still doesn't work:
=====
root at ucstest-osl2:~# cat /etc/multipath.conf
devices {
device {
vendor "DGC"
product ".*"
product_blacklist "LUNZ"
path_grouping_policy group_by_prio
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "round-robin 0"
path_checker emc_clariion
checker emc_clariion
features "1 queue_if_no_path"
hardware_handler "1 alua"
prio emc
failback immediate
no_path_retry 60
fast_io_fail_tmo 3
dev_loss_tmo 2147483647
}
}
multipaths {
multipath {
wwid 3600601603a71320022967e0a1f38e411
alias bootvolume
}
}
root at ucstest-osl2:~# multipath -v 2
Aug 29 10:39:57 | bootvolume failed to set /class/fc_remote_ports/rport-0:0-1/dev_loss_tmo
create: bootvolume (3600601603a71320022967e0a1f38e411) undef DGC,VRAID
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
|-+- policy='round-robin 0' prio=1 status=undef
| |- 0:0:0:0 sda 8:0 undef ready running
| `- 1:0:1:0 sdd 8:48 undef ready running
`-+- policy='round-robin 0' prio=0 status=undef
|- 0:0:1:0 sdb 8:16 undef ready running
`- 1:0:0:0 sdc 8:32 undef ready running
root at ucstest-osl2:~# grep . /sys/class/fc_remote_ports/rport-*/*tmo
/sys/class/fc_remote_ports/rport-0:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-0:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-0:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-0:0-2/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-0/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-1/fast_io_fail_tmo:off
/sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo:30
/sys/class/fc_remote_ports/rport-1:0-2/fast_io_fail_tmo:off
=====
The *_tmo settings were read and understood by the config file parser,
as I can see them occur in the output from «multipathd -k'show config'».
It is also clear that they are recognised as supported options, because
if I add another «foo» option with the value of «bar» right below them,
that one does *not* show up in «multipathd -k'show config'» - so it's
clear the config parser doesn't just blindly read in any settings it
encounters.
So it clearly does not work. In any case, if you need it I'd be happy to
give you access to this test machine so you can see for yourself,
Mathieu. Find me on the NetworkManager IRC channel if you're interested
in that.
Tore
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1435706
Title:
DevLossTO, FastIoFailTO settings do not match multipath.conf expected
values
Status in multipath-tools package in Ubuntu:
Fix Released
Status in multipath-tools source package in Trusty:
Triaged
Status in multipath-tools source package in Vivid:
Triaged
Bug description:
[Impact]
This bug impacts multipath users who need to tweak timeout values for DevLoss and FastIoFail for performance reasons.
[Test Case]
On a multipath system, attempt to modify DevLossTO or FastIoFailTO, then verify that the values got applied with 'multipath -l'. See below.
[Regression Potential]
Users who have already modified these values but have not noticed they did not properly apply may notice a change in behavior on device failure.
---
Problem Description
=========================================
DevLossTO, FastIoFailTO settings do not match multipath.conf expected values
---uname output---
Linux ilp1fc85apA4.tuc.stglabs.ibm.com 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:09:21 UTC 2014 ppc64le ppc64le ppc64le GNU/Linuxuname -m
Machine Type = p7 8247
Steps to Reproduce
===================================
Verify DevLossTO, FastIoFailTO setting match multipath.conf expected values
== Comment: #31 - Thadeu Lima De Souza Cascardo <thadeul at br.ibm.com> - 2015-03-20 10:57:20 ==
OK.
From the point of view of multipathd, everything seems correct, by
looking at the logs.
I even parsed syslog and the output of getHBAInfo in order to find
inconsistencies, and the inconsistency is between what multipathd
logged as configured for a given target, and what its rport reports at
getHBAInfo.
So, either multipathd is not configuring the timeouts even though it
has the right configuration, or something else is changing those
timeouts.
The other problem is that multipathd does not include the dev_loss_tmo
configuration for 2145 as can be seen from list config. So, it could
be not parsing the configuration correctly, or there could be a
problem with the configuration.
At this point, to move forward, I would like to take a look at your
system, and try reconfigure and looking at some strace output of
multipathd, to check for writes into sysfs.
== Comment: #34 - Thadeu Lima De Souza Cascardo <thadeul at br.ibm.com> - 2015-03-20 15:56:46 ==
OK, so I investigated in the system and read some of the code and checked changelog.
It looks like Ubuntu is shipping a fairly old version of multipath-
tools, which is understandable, since multipath-tools is not very good
in doing frequent releases, so one needs to either ship a version
closer to upstream git or include its own large set of patches.
One of the patches missing is the one attached next. Without that, any
devices included in the built-in hardware table will have some of its
attributes from the config file ignored. That is the case with 2145.
So, we lose the dev_loss_tmo setting for that device.
Cascardo.
== Comment: #38 - Thadeu Lima De Souza Cascardo <thadeul at br.ibm.com> - 2015-03-20 16:25:39 ==
The bug this patch fixes would explain why fast_io_fail_tmo is not correctly set in some cases, but not dev_loss_tmo. So, probably, there is another missing patch here. I would like to experiment with the two patches I mentioned, however. Let's try to do this on Monday?
Cascardo.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1435706/+subscriptions
More information about the foundations-bugs
mailing list