[Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets
Rafael David Tinoco
rafael.tinoco at canonical.com
Sun Nov 5 12:38:35 UTC 2017
Hello Matthijs
Unfortunately the best way to make this not to happen is by fixing the
kernel hang situation, when kernel calls sd_sync_cache() to every
configured device before the shutdown. There is a single I/O cmd hanging
in all scsi paths and the I/O error is never propagated to block layer
(despite iscsi having proper I/O error settings). I'm finishing
analysing some kernel dumps so I can finally understand what is
happening in the transport layer (this happens with more recent kernels
also).
The workaround was to create a script that would restore the iscsi
connection, wait for the login to happen again and the paths are back
online, and cleanly logout, allowing the sd_sync_cache() operation to be
finalized.
If you are facing this problem, I know for sure that your iscsi
connections are not being finalized before the network is off. This
means that you have to pay attention on how you configured your iscsi
disks:
- guarantee that iscsiadm was configured with "interfaces" so it works
on startup:
sudo iscsiadm -m iface -I ens4 --op=new -n iface.hwaddress -v 52:54:00:b4:21:bb
sudo iscsiadm -m iface -I ens7 --op=new -n iface.hwaddress -v 52:54:00:c2:34:1b
- the discovery/login has to be made AFTER the iscsiadm had interfaces
added
sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER1
sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER2
# iscsiadm -m node --loginall=automatic HAS TO WORK or else init
scripts will fail
http://pastebin.ubuntu.com/25894472/
- configure the volumes in /etc/fstab with "_netdev" parameter for
systemd unit ordering
LABEL=BLUE /blue ext4 defaults,_netdev 0 1
LABEL=GREEN /green ext4 defaults,_netdev 0 1
LABEL=PURPLE /purple ext4 defaults,_netdev 0 1
LABEL=RED /red ext4 defaults,_netdev 0 1
LABEL=YELLOW /yellow ext4 defaults,_netdev 0 1
You have to make sure open-iscsi and iscsid systemd units are started
after the network is available and are stopped before they disappear.
That might be your problem, if configuration above is correct.
inaddy at iscsihang:~$ systemctl edit --full iscsid.service
inaddy at iscsihang:~$ systemctl edit --full open-iscsi.service
The defaults are:
[Unit]
Description=iSCSI initiator daemon (iscsid)
Documentation=man:iscsid(8)
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target
and
[Unit]
Description=Login to default iSCSI targets
Documentation=man:iscsiadm(8) man:iscsid(8)
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target
So you can see that iscsid.service runs BEFORE open-iscsi.service. In
my case, I'm configuring network using rc-local.service (since this is
my lab) and I had to guarantee the ordering also:
If, after configuring your system like this, you still face problems,
you can use this script:
http://pastebin.ubuntu.com/25894592/
And provide me the DEBUG=/.shutdown.log file, created after its
execution, attached to this launchpad case. Its likely that you will
have hang iscsi connections for some reason (services ordering, lack of
volumes in fstab so umounts are not done, etc).
Hope it helps for now.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to open-iscsi in Ubuntu.
https://bugs.launchpad.net/bugs/1569925
Title:
Shutdown hang on 16.04 with iscsi targets
Status in linux package in Ubuntu:
In Progress
Status in open-iscsi package in Ubuntu:
In Progress
Status in linux source package in Xenial:
In Progress
Status in open-iscsi source package in Xenial:
In Progress
Status in linux source package in Zesty:
In Progress
Status in open-iscsi source package in Zesty:
In Progress
Status in linux source package in Artful:
In Progress
Status in open-iscsi source package in Artful:
In Progress
Bug description:
I have 4 servers running the latest 16.04 updates from the development
branch (as of right now).
Each server is connected to NetApp storage using iscsi software
initiator. There are a total of 56 volumes spread across two NetApp
arrays. Each volume has 4 paths available to it which are being
managed by device mapper.
While logged into the iscsi sessions all I have to do is reboot the
server and I get a hang.
I see a message that says:
"Reached target Shutdown"
followed by
"systemd-shutdown[1]: Failed to finalize DM devices, ignoring"
and then I see 8 lines that say:
"connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
NOTE: the actual values of the *'s differ for each line above.
This seems like a bug somewhere but I am unaware of any additional
logging that I could turn on to pinpoint the problem.
Note I also have similar setups that are not doing iscsi and they
don't have this problem.
Here is a screenshot of what I see on the shell when I try to reboot:
(https://launchpadlibrarian.net/291303059/Screenshot.jpg)
This is being tracked in NetApp bug tracker CQ number 860251.
If I log out of all iscsi sessions before rebooting then I do not
experience the hang:
iscsiadm -m node -U all
We are wondering if this could be some kind of shutdown ordering
problem. Like the network devices have already disappeared and then
iscsi tries to perform some operation (hence the ping timeouts).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1569925/+subscriptions
More information about the foundations-bugs
mailing list