[Bug 1877617] Re: Automatic scans cause instability for cloud use cases

Ben Swartzlander ben at swartzlander.org
Fri May 8 20:32:38 UTC 2020


For debdiff files that add the necessary patches, please take a look here:
https://launchpad.net/~bswartz/+archive/ubuntu/open-iscsi/+packages


** Description changed:

  [Impact]
  
  When using iSCSI storage underneath cloud applications such as OpenStack
  or Kubernetes, the automatic bus scan on login causes problems, because
  it results in SCSI disks being registered in the kernel that will never
  get cleaned up, and when those disks are eventually deleted off the
  server, I/O errors begin to accumulate, eventually slowing down the
  whole SCSI subsystem, spamming the kernel log, and causing timeouts at
  higher levels such that users are forced to reboot the node to get back
  to a usable state.
  
  [Test Case]
  
- TBD...
+ ################
+ 
+ # To demonstrate this problem, I create a VM running Ubuntu 20.04.0
+ 
+ # Install both iSCSI initiator and target on this host
+ sudo apt-get -y install open-iscsi targetcli-fb
+ 
+ # Start the services
+ sudo systemctl start iscsid.service targetclid.service
+ 
+ # Create a randomly generated target IQN
+ TARGET_IQN=$(iscsi-iname)
+ 
+ # Get the initator IQN
+ INITIATOR_IQN=$(sudo awk -F = '/InitiatorName=/ {print $2}' /etc/iscsi/initiatorname.iscsi)
+ 
+ # Set up an iSCSI target and target portal, and grant access to ourselves
+ sudo targetcli /iscsi create $TARGET_IQN
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/acls create $INITIATOR_IQN
+ 
+ # Create two 1GiB LUNs backed by files, and expose them through the target portal
+ sudo targetcli /backstores/fileio create lun1 /lun1 1G
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun1 1
+ sudo targetcli /backstores/fileio create lun2 /lun2 1G
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun2 2
+ 
+ # Truncate the kernel log so we can see messages after this point only
+ sudo dmesg -C
+ 
+ # Register the local iSCSI target with out initiator, and login
+ sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN -o new
+ sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN --login
+ 
+ # Get the list of disks from the iSCSI session, and stash it in an array
+ eval "DISKS=\$(sudo iscsiadm -m session -P3 | awk '/Attached scsi disk/ {print \$4}')"
+ 
+ # Print the list
+ echo $DISKS
+ 
+ # Note that there are two disks found already (the two LUNs we created
+ # above) despite the fact that we only just logged in.
+ 
+ # Now delete a LUN from the target
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns delete lun2
+ sudo targetcli /backstores/fileio delete lun2
+ 
+ # Attempt to read each of the disks
+ for DISK in $DISKS ; do sudo blkid /dev/$DISK || true ; done
+ 
+ # Look at the kernel log
+ dmesg
+ 
+ # Notice I/O errors related to the disk that the kernel remembers
+ 
+ ################
+ 
+ # Now to demostrate how this problem is fixed, I create a new Ubuntu
+ 20.04.0 VM
+ 
+ 
+ # Add PPA with modified version of open-iscsi
+ sudo add-apt-repository -y ppa:bswartz/open-iscsi
+ sudo apt-get update
+ 
+ # Install both iSCSI initiator and target on this host
+ sudo apt-get -y install open-iscsi targetcli-fb
+ 
+ # Start the services
+ sudo systemctl start iscsid.service targetclid.service
+ 
+ # Set the scan option to "manual"
+ sudo sed -i 's/^\(node.session.scan\).*/\1 = manual/' /etc/iscsi/iscsid.conf
+ sudo systemctl restart iscsid.service
+ 
+ # Create a randomly generated target IQN
+ TARGET_IQN=$(iscsi-iname)
+ 
+ # Get the initator IQN
+ INITIATOR_IQN=$(sudo awk -F = '/InitiatorName=/ {print $2}' /etc/iscsi/initiatorname.iscsi)
+ 
+ # Set up an iSCSI target and target portal, and grant access to ourselves
+ sudo targetcli /iscsi create $TARGET_IQN
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/acls create $INITIATOR_IQN
+ 
+ # Create two 1GiB LUNs backed by files, and expose them through the target portal
+ sudo targetcli /backstores/fileio create lun1 /lun1 1G
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun1 1
+ sudo targetcli /backstores/fileio create lun2 /lun2 1G
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun2 2
+ 
+ # Truncate the kernel log so we can see messages after this point only
+ sudo dmesg -C
+ 
+ # Register the local iSCSI target with out initiator, and login
+ sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN -o new
+ sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN --login
+ 
+ # Get the list of disks from the iSCSI session, and stash it in an array
+ eval "DISKS=\$(sudo iscsiadm -m session -P3 | awk '/Attached scsi disk/ {print \$4}')"
+ 
+ # Print the list
+ echo $DISKS
+ 
+ # Note that the list is empty!
+ 
+ # Get the iSCSI host
+ SCSI_HOST=$(ls /sys/class/iscsi_host)
+ 
+ # Specifically scan the one disk we want
+ sudo sh -c "echo '0 0 1' > /sys/class/scsi_host/$SCSI_HOST/scan"
+ 
+ # Get the list of disks from the iSCSI session, and stash it in an array
+ eval "DISKS=\$(sudo iscsiadm -m session -P3 | awk '/Attached scsi disk/ {print \$4}')"
+ 
+ # Print the list
+ echo $DISKS
+ 
+ # This time notice there's exactly one disk
+ 
+ # Now delete the other LUN from the target
+ sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns delete lun2
+ sudo targetcli /backstores/fileio delete lun2
+ 
+ # Attempt to read each of the disks
+ for DISK in $DISKS ; do sudo blkid /dev/$DISK || true ; done
+ 
+ # Look at the kernel log
+ dmesg
+ 
+ # No errors in the log
+ 
+ ################
  
  [Regression Potential]
  
  These changes have been proven safe by 3 years of soak time in the
  RedHat ecosystem, so I don't see much risk to taking them into Ubuntu.
  They apply cleanly to the most recent versions of focal, bionic, and
  xenial.
  
- The change introduces a new config option in iscsid.conf but the default
- is to do exactly what it used to do. Only users who explicitly change
- this option will get altered behavior, and
+ The change introduces a new config option in iscsid.conf but the default is to do exactly what it used to do. Only users who explicitly change this option will get altered behavior, and the behavior with the option set is
+ superior for the above mentioned cloud use cases.
  
  [Other Info]
  
  RedHat discovered this problem more than 3 years ago and fixed it
  upstream.
  
  https://bugzilla.redhat.com/show_bug.cgi?id=1422941
  
  I had hoped that Debian would eventually pick up the version in which it
  was fixed, but another LTS has gone by without picking up the newer
  upstream version, and this is a critical problem, so I propose
  backporting the fixes.
  
  The 2 patches that need porting are:
  https://github.com/open-iscsi/open-iscsi/pull/40
  https://github.com/open-iscsi/open-iscsi/pull/49

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to open-iscsi in Ubuntu.
https://bugs.launchpad.net/bugs/1877617

Title:
  Automatic scans cause instability for cloud use cases

Status in open-iscsi package in Ubuntu:
  New

Bug description:
  [Impact]

  When using iSCSI storage underneath cloud applications such as
  OpenStack or Kubernetes, the automatic bus scan on login causes
  problems, because it results in SCSI disks being registered in the
  kernel that will never get cleaned up, and when those disks are
  eventually deleted off the server, I/O errors begin to accumulate,
  eventually slowing down the whole SCSI subsystem, spamming the kernel
  log, and causing timeouts at higher levels such that users are forced
  to reboot the node to get back to a usable state.

  [Test Case]

  ################

  # To demonstrate this problem, I create a VM running Ubuntu 20.04.0

  # Install both iSCSI initiator and target on this host
  sudo apt-get -y install open-iscsi targetcli-fb

  # Start the services
  sudo systemctl start iscsid.service targetclid.service

  # Create a randomly generated target IQN
  TARGET_IQN=$(iscsi-iname)

  # Get the initator IQN
  INITIATOR_IQN=$(sudo awk -F = '/InitiatorName=/ {print $2}' /etc/iscsi/initiatorname.iscsi)

  # Set up an iSCSI target and target portal, and grant access to ourselves
  sudo targetcli /iscsi create $TARGET_IQN
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/acls create $INITIATOR_IQN

  # Create two 1GiB LUNs backed by files, and expose them through the target portal
  sudo targetcli /backstores/fileio create lun1 /lun1 1G
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun1 1
  sudo targetcli /backstores/fileio create lun2 /lun2 1G
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun2 2

  # Truncate the kernel log so we can see messages after this point only
  sudo dmesg -C

  # Register the local iSCSI target with out initiator, and login
  sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN -o new
  sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN --login

  # Get the list of disks from the iSCSI session, and stash it in an array
  eval "DISKS=\$(sudo iscsiadm -m session -P3 | awk '/Attached scsi disk/ {print \$4}')"

  # Print the list
  echo $DISKS

  # Note that there are two disks found already (the two LUNs we created
  # above) despite the fact that we only just logged in.

  # Now delete a LUN from the target
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns delete lun2
  sudo targetcli /backstores/fileio delete lun2

  # Attempt to read each of the disks
  for DISK in $DISKS ; do sudo blkid /dev/$DISK || true ; done

  # Look at the kernel log
  dmesg

  # Notice I/O errors related to the disk that the kernel remembers

  ################

  # Now to demostrate how this problem is fixed, I create a new Ubuntu
  20.04.0 VM

  
  # Add PPA with modified version of open-iscsi
  sudo add-apt-repository -y ppa:bswartz/open-iscsi
  sudo apt-get update

  # Install both iSCSI initiator and target on this host
  sudo apt-get -y install open-iscsi targetcli-fb

  # Start the services
  sudo systemctl start iscsid.service targetclid.service

  # Set the scan option to "manual"
  sudo sed -i 's/^\(node.session.scan\).*/\1 = manual/' /etc/iscsi/iscsid.conf
  sudo systemctl restart iscsid.service

  # Create a randomly generated target IQN
  TARGET_IQN=$(iscsi-iname)

  # Get the initator IQN
  INITIATOR_IQN=$(sudo awk -F = '/InitiatorName=/ {print $2}' /etc/iscsi/initiatorname.iscsi)

  # Set up an iSCSI target and target portal, and grant access to ourselves
  sudo targetcli /iscsi create $TARGET_IQN
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/acls create $INITIATOR_IQN

  # Create two 1GiB LUNs backed by files, and expose them through the target portal
  sudo targetcli /backstores/fileio create lun1 /lun1 1G
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun1 1
  sudo targetcli /backstores/fileio create lun2 /lun2 1G
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns create /backstores/fileio/lun2 2

  # Truncate the kernel log so we can see messages after this point only
  sudo dmesg -C

  # Register the local iSCSI target with out initiator, and login
  sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN -o new
  sudo iscsiadm -m node -p 127.0.0.1 -T $TARGET_IQN --login

  # Get the list of disks from the iSCSI session, and stash it in an array
  eval "DISKS=\$(sudo iscsiadm -m session -P3 | awk '/Attached scsi disk/ {print \$4}')"

  # Print the list
  echo $DISKS

  # Note that the list is empty!

  # Get the iSCSI host
  SCSI_HOST=$(ls /sys/class/iscsi_host)

  # Specifically scan the one disk we want
  sudo sh -c "echo '0 0 1' > /sys/class/scsi_host/$SCSI_HOST/scan"

  # Get the list of disks from the iSCSI session, and stash it in an array
  eval "DISKS=\$(sudo iscsiadm -m session -P3 | awk '/Attached scsi disk/ {print \$4}')"

  # Print the list
  echo $DISKS

  # This time notice there's exactly one disk

  # Now delete the other LUN from the target
  sudo targetcli /iscsi/$TARGET_IQN/tpg1/luns delete lun2
  sudo targetcli /backstores/fileio delete lun2

  # Attempt to read each of the disks
  for DISK in $DISKS ; do sudo blkid /dev/$DISK || true ; done

  # Look at the kernel log
  dmesg

  # No errors in the log

  ################

  [Regression Potential]

  These changes have been proven safe by 3 years of soak time in the
  RedHat ecosystem, so I don't see much risk to taking them into Ubuntu.
  They apply cleanly to the most recent versions of focal, bionic, and
  xenial.

  The change introduces a new config option in iscsid.conf but the default is to do exactly what it used to do. Only users who explicitly change this option will get altered behavior, and the behavior with the option set is
  superior for the above mentioned cloud use cases.

  [Other Info]

  RedHat discovered this problem more than 3 years ago and fixed it
  upstream.

  https://bugzilla.redhat.com/show_bug.cgi?id=1422941

  I had hoped that Debian would eventually pick up the version in which
  it was fixed, but another LTS has gone by without picking up the newer
  upstream version, and this is a critical problem, so I propose
  backporting the fixes.

  The 2 patches that need porting are:
  https://github.com/open-iscsi/open-iscsi/pull/40
  https://github.com/open-iscsi/open-iscsi/pull/49

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1877617/+subscriptions



More information about the foundations-bugs mailing list