[Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets
Rafael David Tinoco
rafael.tinoco at canonical.com
Sat Aug 26 06:04:59 UTC 2017
Okay, I removed almost everything out of equation:
- removed networking interfaces from systemd
- removed open-iscsi login/logout logic from systemd
- used a single network interface for the logins, on a single iscsi portal
was still able to reproduce the issue by:
- doing a simple login after configuring the network device:
./net-start.sh ; sleep 1 ; sudo iscsiadm -m node --login
- shutting down the network device before any logout:
./net-stop.sh ; sudo shutdown -h now
There was no systemd service order in play (in between open-iscsi,
iscsid and networking / ifupdown scripts) and I was able to cause the
same issue 100%, same messages, same symptoms. This tells us that,
definitely turning down the network while still logged iscsi sessions is
causing the hangs (check #3 for why).
Summary of Causes
1)
network shuts down -> iscsi luns are logged out (attempt) -> iscsid daemon is shutdown
or
network shuts down -> iscsi daemon is shutdown -> iscsi luns are logged out
- logout would fail due to lack of network
- iscsi daemon would die and/or logout wouldn't work
- iscsi sessions would still be there
- system would hang (transport layer can't logout)
2)
iscsi daemon is shutdown -> iscsi luns are logged out -> network shuts
down
- this would cause the bug i'm mentioning: daemon dies, logout doesn't work.
- some iscsi sessions would still be there
- system would hang (transport layer can't logout)
- (to check: open-iscsi.service ExecStop= running in parallel ?!?)
3)
I used KVM watchdog virtual device + NMI from host to crash the guest
during the hang
http://pastebin.ubuntu.com/25394744/
And, finally, the hang is because kernel is hanged during its shutdown
logic.
> 0 0 0 ffffffff81e11500 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff8801a6a20e00 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff8801a6a21c00 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff8801a6a22a00 RU 0.0 0 0 [swapper/3]
ALL CPUs were idling during the hang.
crash> runq
CPU 0 RUNQUEUE: ffff8801ae016e00
CURRENT: PID: 0 TASK: ffffffff81e11500 COMMAND: "swapper/0"
RT PRIO_ARRAY: ffff8801ae016fb0
[no tasks queued]
CFS RB_ROOT: ffff8801ae016e98
[no tasks queued]
CPU 1 RUNQUEUE: ffff8801ae096e00
CURRENT: PID: 0 TASK: ffff8801a6a20e00 COMMAND: "swapper/1"
RT PRIO_ARRAY: ffff8801ae096fb0
[no tasks queued]
CFS RB_ROOT: ffff8801ae096e98
[no tasks queued]
CPU 2 RUNQUEUE: ffff8801ae116e00
CURRENT: PID: 0 TASK: ffff8801a6a21c00 COMMAND: "swapper/2"
RT PRIO_ARRAY: ffff8801ae116fb0
[no tasks queued]
CFS RB_ROOT: ffff8801ae116e98
[no tasks queued]
CPU 3 RUNQUEUE: ffff8801ae196e00
CURRENT: PID: 0 TASK: ffff8801a6a22a00 COMMAND: "swapper/3"
RT PRIO_ARRAY: ffff8801ae196fb0
[no tasks queued]
CFS RB_ROOT: ffff8801ae196e98
[no tasks queued]
NO task was scheduled to run.
crash> ps -u
PID PPID CPU TASK ST %MEM VSZ RSS COMM
1 0 1 ffff8801a69b8000 UN 0.0 15484 3204 systemd-shutdow
There was just ONE SINGLE user task running (systemd-shutdown)
crash> set ffff8801a69b8000
PID: 1
COMMAND: "systemd-shutdow"
TASK: ffff8801a69b8000 [THREAD_INFO: ffff8801a69c0000]
CPU: 1
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
#0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
#1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
#2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
#3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
#4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
#5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
#6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
#7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
#8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
#9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
#10 [ffff8801a69c3dc0] device_shutdown at ffffffff8156112c
#11 [ffff8801a69c3df0] kernel_power_off at ffffffff810a32f5
#12 [ffff8801a69c3e00] SYSC_reboot at ffffffff810a34d3
#13 [ffff8801a69c3f40] sys_reboot at ffffffff810a359e
#14 [ffff8801a69c3f50] entry_SYSCALL_64_fastpath at ffffffff818431f2
RIP: 00007face7d8f856 RSP: 00007ffd4c271d08 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 00007ffd4c271240 RCX: 00007face7d8f856
RDX: 000000004321fedc RSI: 0000000028121969 RDI: fffffffffee1dead
RBP: 00007ffd4c271230 R8: 0000000000001400 R9: 0000000000000005
R10: 00007face88d36c0 R11: 0000000000000202 R12: 00007ffd4c2713d0
R13: 000000b7e6795c83 R14: 00007ffd4c271c30 R15: 0000000000000001
ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b
And it called the "kernel_power_off" logic from a system call.
This logic iterates over all devices calling:
if (dev->bus && dev->bus->shutdown) {
dev->bus->shutdown(dev);
} else if (dev->driver && dev->driver->shutdown) {
dev->driver->shutdown(dev);
}
This called sd_shutdown, which called sd_sync_cache, which sent a
SYNCHRONIZE_CACHE scsi op code to the device in question (one of the
iscsi paths). scsi_execute called blk_execute_rq to put the request on
device's queue, for execution.
/* Prevent hang_check timer from firing at us during very long I/O */
hang_check = sysctl_hung_task_timeout_secs;
if (hang_check)
while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2)));
The default timeout for that operation is hang_check 120 seconds * 250/2
AND, "wait_for_completion_io_timeout" returns 0, !0 == 1 so.. this will
run forever.
So basically kernel is trying o complete SYNCHRONIZE_CACHE operation,
the SCSI cmd is already in device's queue and there is no return (for
obvious reasons).
Obs 1)
I have tested this with kernel 4.11 with the same symptom. I have to check now: is there any error propagation logic to cancel this so system can shutdown ? We are not getting a SCSI sense message because the "bus" (iscsi/tcp) is gone. Should we hard flush the device's queue ?
Obs 2)
This isn't necessarily related to systemd. IMO this should also be addressed in kernel, allowing a broken transport layer - specially tcp/ip - to allow the shutdown to happen.
Obs 3)
The cause of this can be related to systemd (if there are leftovers) or iscsid daemon (if it died and/or got restarted before logging out. is there a way of keeping history of sessions so it knows what to logout after daemon starts again ?).
I'll get back to this next week, after reading more this dump.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1569925
Title:
Shutdown hang on 16.04 with iscsi targets
Status in systemd package in Ubuntu:
Confirmed
Status in systemd source package in Xenial:
In Progress
Bug description:
I have 4 servers running the latest 16.04 updates from the development
branch (as of right now).
Each server is connected to NetApp storage using iscsi software
initiator. There are a total of 56 volumes spread across two NetApp
arrays. Each volume has 4 paths available to it which are being
managed by device mapper.
While logged into the iscsi sessions all I have to do is reboot the
server and I get a hang.
I see a message that says:
"Reached target Shutdown"
followed by
"systemd-shutdown[1]: Failed to finalize DM devices, ignoring"
and then I see 8 lines that say:
"connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
NOTE: the actual values of the *'s differ for each line above.
This seems like a bug somewhere but I am unaware of any additional
logging that I could turn on to pinpoint the problem.
Note I also have similar setups that are not doing iscsi and they
don't have this problem.
Here is a screenshot of what I see on the shell when I try to reboot:
(https://launchpadlibrarian.net/291303059/Screenshot.jpg)
This is being tracked in NetApp bug tracker CQ number 860251.
If I log out of all iscsi sessions before rebooting then I do not
experience the hang:
iscsiadm -m node -U all
We are wondering if this could be some kind of shutdown ordering
problem. Like the network devices have already disappeared and then
iscsi tries to perform some operation (hence the ping timeouts).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1569925/+subscriptions
More information about the foundations-bugs
mailing list