[Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets

Sat Aug 26 06:04:59 UTC 2017

Okay, I removed almost everything out of equation:

- removed networking interfaces from systemd
- removed open-iscsi login/logout logic from systemd
- used a single network interface for the logins, on a single iscsi portal

was still able to reproduce the issue by:

- doing a simple login after configuring the network device: 
  ./net-start.sh ; sleep 1 ; sudo iscsiadm -m node --login

- shutting down the network device before any logout:
  ./net-stop.sh ; sudo shutdown -h now

There was no systemd service order in play (in between open-iscsi,
iscsid and networking / ifupdown scripts) and I was able to cause the
same issue 100%, same messages, same symptoms. This tells us that,
definitely turning down the network while still logged iscsi sessions is
causing the hangs (check #3 for why).

Summary of Causes

1)

network shuts down -> iscsi luns are logged out (attempt) -> iscsid daemon is shutdown
or
network shuts down -> iscsi daemon is shutdown -> iscsi luns are logged out

- logout would fail due to lack of network
- iscsi daemon would die and/or logout wouldn't work
- iscsi sessions would still be there
- system would hang (transport layer can't logout)

2)

iscsi daemon is shutdown -> iscsi luns are logged out -> network shuts
down

- this would cause the bug i'm mentioning: daemon dies, logout doesn't work.
- some iscsi sessions would still be there
- system would hang (transport layer can't logout)
- (to check: open-iscsi.service ExecStop= running in parallel ?!?)

3)

I used KVM watchdog virtual device + NMI from host to crash the guest
during the hang

http://pastebin.ubuntu.com/25394744/

And, finally, the hang is because kernel is hanged during its shutdown
logic.

>     0      0   0  ffffffff81e11500  RU   0.0       0      0  [swapper/0]
>     0      0   1  ffff8801a6a20e00  RU   0.0       0      0  [swapper/1]
>     0      0   2  ffff8801a6a21c00  RU   0.0       0      0  [swapper/2]
>     0      0   3  ffff8801a6a22a00  RU   0.0       0      0  [swapper/3]

ALL CPUs were idling during the hang.

crash> runq
CPU 0 RUNQUEUE: ffff8801ae016e00
  CURRENT: PID: 0      TASK: ffffffff81e11500  COMMAND: "swapper/0"
  RT PRIO_ARRAY: ffff8801ae016fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae016e98
     [no tasks queued]

CPU 1 RUNQUEUE: ffff8801ae096e00
  CURRENT: PID: 0      TASK: ffff8801a6a20e00  COMMAND: "swapper/1"
  RT PRIO_ARRAY: ffff8801ae096fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae096e98
     [no tasks queued]

CPU 2 RUNQUEUE: ffff8801ae116e00
  CURRENT: PID: 0      TASK: ffff8801a6a21c00  COMMAND: "swapper/2"
  RT PRIO_ARRAY: ffff8801ae116fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae116e98
     [no tasks queued]

CPU 3 RUNQUEUE: ffff8801ae196e00
  CURRENT: PID: 0      TASK: ffff8801a6a22a00  COMMAND: "swapper/3"
  RT PRIO_ARRAY: ffff8801ae196fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae196e98
     [no tasks queued]

NO task was scheduled to run.

crash> ps -u
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      1      0   1  ffff8801a69b8000  UN   0.0   15484   3204  systemd-shutdow

There was just ONE SINGLE user task running (systemd-shutdown)

crash> set ffff8801a69b8000
    PID: 1
COMMAND: "systemd-shutdow"
   TASK: ffff8801a69b8000  [THREAD_INFO: ffff8801a69c0000]
    CPU: 1
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 1      TASK: ffff8801a69b8000  CPU: 1   COMMAND: "systemd-shutdow"
 #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
 #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
 #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
 #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
#10 [ffff8801a69c3dc0] device_shutdown at ffffffff8156112c
#11 [ffff8801a69c3df0] kernel_power_off at ffffffff810a32f5
#12 [ffff8801a69c3e00] SYSC_reboot at ffffffff810a34d3
#13 [ffff8801a69c3f40] sys_reboot at ffffffff810a359e
#14 [ffff8801a69c3f50] entry_SYSCALL_64_fastpath at ffffffff818431f2
    RIP: 00007face7d8f856  RSP: 00007ffd4c271d08  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 00007ffd4c271240  RCX: 00007face7d8f856
    RDX: 000000004321fedc  RSI: 0000000028121969  RDI: fffffffffee1dead
    RBP: 00007ffd4c271230   R8: 0000000000001400   R9: 0000000000000005
    R10: 00007face88d36c0  R11: 0000000000000202  R12: 00007ffd4c2713d0
    R13: 000000b7e6795c83  R14: 00007ffd4c271c30  R15: 0000000000000001
    ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

And it called the "kernel_power_off" logic from a system call.

This logic iterates over all devices calling:

if (dev->bus && dev->bus->shutdown) {
    dev->bus->shutdown(dev);
} else if (dev->driver && dev->driver->shutdown) {
    dev->driver->shutdown(dev);
}

This called sd_shutdown, which called sd_sync_cache, which sent a
SYNCHRONIZE_CACHE scsi op code to the device in question (one of the
iscsi paths). scsi_execute called blk_execute_rq to put the request on
device's queue, for execution.

/* Prevent hang_check timer from firing at us during very long I/O */
hang_check = sysctl_hung_task_timeout_secs;
if (hang_check)
    while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2)));

The default timeout for that operation is hang_check 120 seconds * 250/2
AND, "wait_for_completion_io_timeout" returns 0, !0 == 1 so.. this will
run forever.

So basically kernel is trying o complete SYNCHRONIZE_CACHE operation,
the SCSI cmd is already in device's queue and there is no return (for
obvious reasons).

Obs 1)
I have tested this with kernel 4.11 with the same symptom. I have to check now: is there any error propagation logic to cancel this so system can shutdown ? We are not getting a SCSI sense message because the "bus" (iscsi/tcp) is gone. Should we hard flush the device's queue ? 

Obs 2)
This isn't necessarily related to systemd. IMO this should also be addressed in kernel, allowing a broken transport layer - specially tcp/ip - to allow the shutdown to happen.

Obs 3)
The cause of this can be related to systemd (if there are leftovers) or iscsid daemon (if it died and/or got restarted before logging out. is there a way of keeping history of sessions so it knows what to logout after daemon starts again ?).

I'll get back to this next week, after reading more this dump.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1569925

Title:
  Shutdown hang on 16.04 with iscsi targets

Status in systemd package in Ubuntu:
  Confirmed
Status in systemd source package in Xenial:
  In Progress

Bug description:
  I have 4 servers running the latest 16.04 updates from the development
  branch (as of right now).

  Each server is connected to NetApp storage using iscsi software
  initiator.  There are a total of 56 volumes spread across two NetApp
  arrays.  Each volume has 4 paths available to it which are being
  managed by device mapper.

  While logged into the iscsi sessions all I have to do is reboot the
  server and I get a hang.

  I see a message that says:

    "Reached target Shutdown"

  followed by

    "systemd-shutdown[1]: Failed to finalize DM devices, ignoring"

  and then I see 8 lines that say:

    "connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    NOTE: the actual values of the *'s differ for each line above.

  This seems like a bug somewhere but I am unaware of any additional
  logging that I could turn on to pinpoint the problem.

  Note I also have similar setups that are not doing iscsi and they
  don't have this problem.

  Here is a screenshot of what I see on the shell when I try to reboot:

  (https://launchpadlibrarian.net/291303059/Screenshot.jpg)

  This is being tracked in NetApp bug tracker CQ number 860251.

  If I log out of all iscsi sessions before rebooting then I do not
  experience the hang:

  iscsiadm -m node -U all

  We are wondering if this could be some kind of shutdown ordering
  problem.  Like the network devices have already disappeared and then
  iscsi tries to perform some operation (hence the ping timeouts).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1569925/+subscriptions