[Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets

Sun Nov 5 12:38:35 UTC 2017

Hello Matthijs

Unfortunately the best way to make this not to happen is by fixing the
kernel hang situation, when kernel calls sd_sync_cache() to every
configured device before the shutdown. There is a single I/O cmd hanging
in all scsi paths and the I/O error is never propagated to block layer
(despite iscsi having proper I/O error settings). I'm finishing
analysing some kernel dumps so I can finally understand what is
happening in the transport layer (this happens with more recent kernels
also).

The workaround was to create a script that would restore the iscsi
connection, wait for the login to happen again and the paths are back
online, and cleanly logout, allowing the sd_sync_cache() operation to be
finalized.

If you are facing this problem, I know for sure that your iscsi
connections are not being finalized before the network is off. This
means that you have to pay attention on how you configured your iscsi
disks:

- guarantee that iscsiadm was configured with "interfaces" so it works
on startup:

  sudo iscsiadm -m iface -I ens4 --op=new -n iface.hwaddress -v 52:54:00:b4:21:bb
  sudo iscsiadm -m iface -I ens7 --op=new -n iface.hwaddress -v 52:54:00:c2:34:1b

- the discovery/login has to be made AFTER the iscsiadm had interfaces
added

  sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER1
  sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER2

  # iscsiadm -m node --loginall=automatic HAS TO WORK or else init
scripts will fail

  http://pastebin.ubuntu.com/25894472/

- configure the volumes in /etc/fstab with "_netdev" parameter for
systemd unit ordering

  LABEL=BLUE /blue ext4 defaults,_netdev 0 1
  LABEL=GREEN /green ext4 defaults,_netdev 0 1
  LABEL=PURPLE /purple ext4 defaults,_netdev 0 1
  LABEL=RED /red ext4 defaults,_netdev 0 1
  LABEL=YELLOW /yellow ext4 defaults,_netdev 0 1

You have to make sure open-iscsi and iscsid systemd units are started
after the network is available and are stopped before they disappear.
That might be your problem, if configuration above is correct.

inaddy at iscsihang:~$ systemctl edit --full iscsid.service
inaddy at iscsihang:~$ systemctl edit --full open-iscsi.service

The defaults are:

[Unit]
Description=iSCSI initiator daemon (iscsid)
Documentation=man:iscsid(8)
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target

and

[Unit]
Description=Login to default iSCSI targets
Documentation=man:iscsiadm(8) man:iscsid(8)
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target

So you can see that iscsid.service runs BEFORE open-iscsi.service.  In
my case, I'm configuring network using rc-local.service (since this is
my lab) and I had to guarantee the ordering also:

If, after configuring your system like this, you still face problems,
you can use this script:

http://pastebin.ubuntu.com/25894592/

And provide me the DEBUG=/.shutdown.log file, created after its
execution, attached to this launchpad case. Its likely that you will
have hang iscsi connections for some reason (services ordering, lack of
volumes in fstab so umounts are not done, etc).

Hope it helps for now.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to open-iscsi in Ubuntu.
https://bugs.launchpad.net/bugs/1569925

Title:
  Shutdown hang on 16.04 with iscsi targets

Status in linux package in Ubuntu:
  In Progress
Status in open-iscsi package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in open-iscsi source package in Xenial:
  In Progress
Status in linux source package in Zesty:
  In Progress
Status in open-iscsi source package in Zesty:
  In Progress
Status in linux source package in Artful:
  In Progress
Status in open-iscsi source package in Artful:
  In Progress

Bug description:
  I have 4 servers running the latest 16.04 updates from the development
  branch (as of right now).

  Each server is connected to NetApp storage using iscsi software
  initiator.  There are a total of 56 volumes spread across two NetApp
  arrays.  Each volume has 4 paths available to it which are being
  managed by device mapper.

  While logged into the iscsi sessions all I have to do is reboot the
  server and I get a hang.

  I see a message that says:

    "Reached target Shutdown"

  followed by

    "systemd-shutdown[1]: Failed to finalize DM devices, ignoring"

  and then I see 8 lines that say:

    "connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    NOTE: the actual values of the *'s differ for each line above.

  This seems like a bug somewhere but I am unaware of any additional
  logging that I could turn on to pinpoint the problem.

  Note I also have similar setups that are not doing iscsi and they
  don't have this problem.

  Here is a screenshot of what I see on the shell when I try to reboot:

  (https://launchpadlibrarian.net/291303059/Screenshot.jpg)

  This is being tracked in NetApp bug tracker CQ number 860251.

  If I log out of all iscsi sessions before rebooting then I do not
  experience the hang:

  iscsiadm -m node -U all

  We are wondering if this could be some kind of shutdown ordering
  problem.  Like the network devices have already disappeared and then
  iscsi tries to perform some operation (hence the ping timeouts).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1569925/+subscriptions