[SRU][B][F][PATCH 0/3] Fix LIO from hanging in iscsit_free_session and iscsit_stop_session

Kelsey Skunberg kelsey.skunberg at canonical.com
Mon Apr 20 18:39:33 UTC 2020


BugLink: https://bugs.launchpad.net/bugs/1871688

SRU Justification

[Impact]

(Following details are from the bug report)

The target subsystem (LIO) can hang if multiple threads try to destroy iSCSI 
sessions simultaneously. This is reproducible on systems that have multiple
targets with initiators regularly connecting/disconnecting.

This may happen when a "targetcli iscsi/iqn.../tpg1 disable" command is 
executed when a logout operation is underway.

The iscsi target doesn't handle such events in a correct way: two or more 
threads may end up sleeping while waiting for the driver to close the remaining
connections on the session. When the connections are closed, the driver wakes
up only the first thread that will then proceed to destroy the session
structure. The remaining threads are blocked there forever, waiting on a
completion synchronization mechanism that doesn't exist in memory anymore
because it has been freed by the first thread.

Note that if the blocked threads are somehow forced to wake up, they will try 
to free the same iSCSI session structure destroyed by the first thread, causing
double frees, memory corruptions, etc.

The driver has been reorganized so the concurrent threads will set a flag in 
the session structure to notify the driver that the session should be
destroyed; then, they wait for the driver to close the remaining connections.
When the connections are all closed, the driver will wake up all the threads
and will wait for the refcount of the iSCSI session structure to reach zero.
When the last thread wakes up, the refcount is decreased to zero and the driver
can proceed to destroy the session structure because no one is referencing it
anymore.

Bug reporter witnessed this happening on hundreds of Ubuntu 16.04.5 systems.
States this is a regression, because this did not occur several years ago.  No
detailed records from that far back to determine exactly which kernel reporter
was running that was not affected by this bug (Believes it was either 4.8.x or
4.10.x).

Attached in the bug report is the requested uname, version_signature, dmesg, and
lspci from reporter's system. However, the reporter has seen this happen on a
wide array of hardware: 2 to 24 cores, 8GB to 256GB RAM, both AMD and Intel
CPUs, onboard storage and PCIe SAS cards, etc.

This has been fixed in the upstream master branch, but it hasn't yet been 
backported to "-stable".

[Fixes]

These three commits should be backported:
* https://github.com/torvalds/linux/commit/e49a7d994379278d3353d7ffc7994672752fb0ad
* https://github.com/torvalds/linux/commit/57c46e9f33da530a2485fa01aa27b6d18c28c796
* https://github.com/torvalds/linux/commit/626bac73371eed79e2afa2966de393da96cf925e

[Test]

This is reproducible on systems that have multiple targets with initiators
regularly connecting/disconnecting by having multiple threads try and destroy
iSCSI sessions simultaneiously. 

This may happen when a "targetcli iscsi/iqn.../tpg1 disable" command is executed
when a logout operation is underway.


[Regression Risk]

Low, cherry picked from upstream with no changes. 

Verified applies cleanly to Bionic/master-next and Focal/master-next. Build
tests pass.

Maurizio Lombardi (3):
  scsi: target: remove boilerplate code
  scsi: target: fix hang when multiple threads try to destroy the same
    iscsi session
  scsi: target: iscsi: calling iscsit_stop_session() inside
    iscsit_close_session() has no effect

 drivers/target/iscsi/iscsi_target.c          | 82 ++++++--------------
 drivers/target/iscsi/iscsi_target.h          |  1 -
 drivers/target/iscsi/iscsi_target_configfs.c |  5 +-
 drivers/target/iscsi/iscsi_target_login.c    |  5 +-
 include/target/iscsi/iscsi_target_core.h     |  2 +-
 5 files changed, 32 insertions(+), 63 deletions(-)

-- 
2.20.1



More information about the kernel-team mailing list