[Bug 1567602] Comment bridged from LTC Bugzilla

bugproxy bugproxy at us.ibm.com
Fri Apr 15 17:20:47 UTC 2016


------- Comment From thorsten.diehl at de.ibm.com 2016-04-15 13:18 EDT-------
Meanwhile, with kernel 4.4.0-18 and installler version 447, the situation is different:
While the problem originally occured in installer, is happens now, when the target system is being IPLed.

The "switch" (to avoid the word "reason") is the kernel module scsi_dh_alua.
When scsi_dh_alua is loaded, the auto LUN scanning upon activating a zfcp device with NPIV LUNs causes that not all LUNs are detected. I assume, that not the kernel module itself is the problem, but t seems to trigger the problem.

A very easy way to reproduce this is as follows:
have a system installed with kernel 4.4.0-18 (almost GA). Have two NPIV enabled zfcp adaptors (on different PCHIDs) that can access the same (few) LUNs each. Then:

root at s8330005:~# lscss |grep 1732/03
0.0.1905 0.0.0012  1732/03 1731/03      80  80  ff   60000000 00000000
0.0.1945 0.0.0013  1732/03 1731/03      80  80  ff   61000000 00000000
root at s8330005:~# lsmod |grep scsi_dh
root at s8330005:~# chccwdev -e 1905
Setting device 0.0.1905 online
Done
root at s8330005:~# lszfcp -D
0.0.1905/0x50050763070845e3/0x4082405300000000 0:0:0:1079197826
0.0.1905/0x50050763070845e3/0x4083405300000000 0:0:0:1079197827
0.0.1905/0x50050763070845e3/0x4084405300000000 0:0:0:1079197828
0.0.1905/0x50050763070845e3/0x4085405300000000 0:0:0:1079197829
root at s8330005:~# lsmod |grep scsi_dh
root at s8330005:~# modprobe scsi_dh_alua
root at s8330005:~# chccwdev -e 1945
Setting device 0.0.1945 online
Done
root at s8330005:~# lszfcp -D
0.0.1905/0x50050763070845e3/0x4082405300000000 0:0:0:1079197826
0.0.1905/0x50050763070845e3/0x4083405300000000 0:0:0:1079197827
0.0.1905/0x50050763070845e3/0x4084405300000000 0:0:0:1079197828
0.0.1905/0x50050763070845e3/0x4085405300000000 0:0:0:1079197829
0.0.1945/0x50050763071845e3/0x4082405300000000 1:0:0:1079197826
/sbin/lszfcp: line 244: /sys/bus/ccw/drivers/zfcp/0.0.1945/host1/rport-1:0-0/target1:0:0/1:0:0:1079197827//hba_id: No such file or directory
/sbin/lszfcp: line 245: /sys/bus/ccw/drivers/zfcp/0.0.1945/host1/rport-1:0-0/target1:0:0/1:0:0:1079197827//wwpn: No such file or directory
/sbin/lszfcp: line 246: /sys/bus/ccw/drivers/zfcp/0.0.1945/host1/rport-1:0-0/target1:0:0/1:0:0:1079197827//fcp_lun: No such file or directory
0.0.1945/0x50050763071845e3/0x4082405300000000 1:0:0:1079197827
root at s8330005:~#

When a zfcp attached SCSI LUN is used for installation, (almost?) all scsi_dh modules are loaded upon activation of the zfcp device. This causes some LUNs not to be detected.
If I get acces to that system an blacklist the scsi_dh_alua module, the IPL works as expected ( i.e. all the LUNS are detected properly).

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to debian-installer in Ubuntu.
https://bugs.launchpad.net/bugs/1567602

Title:
  FCP devices are not detected correctly nor deterministically

Status in Ubuntu on IBM z Systems:
  New
Status in debian-installer package in Ubuntu:
  New

Bug description:
  Scenario: 
  Using Installer 432.
  No DASD devices, just two FCP CHPIDs with two LUNs each (configured for NPIV), provided via parmfile.
  I expect the installer to probe and detect the LUNs automatically once I enable the FCP CHPIDs.

  Repeated five times, I got different results each time. Detected LUNs
  vary between 2 and 4. One time 3 LUNs appear on SCSI1 and one on SCSI2
  which looks especially odd to me.

  Subsequent steps to create partitions and file systems in the
  installer fail.

  Following is one of the five cases with more details:
  After enabling CHPID 192b:
  ~ # cat /proc/scsi/scsi 
  Attached devices:
  Host: scsi0 Channel: 00 Id: 00 Lun: 1077493890
    Vendor: IBM      Model: 2107900          Rev: 1.27
    Type:   Direct-Access                    ANSI  SCSI revision: 05

  dmesg showing:
  [  221.738816] qdio: 0.0.192b ZFCP on SC f using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W A 
  [  226.895139] scsi 0:0:0:1077493890: Direct-Access     IBM      2107900          1.27 PQ: 0 ANSI: 5
  [  226.895816] sd 0:0:0:1077493890: alua: supports implicit TPGS
  [  226.896673] sd 0:0:0:1077493890: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [  226.897825] sd 0:0:0:1077493890: [sda] Write Protect is off
  [  226.897827] sd 0:0:0:1077493890: [sda] Mode Sense: ed 00 00 08
  [  226.898145] sd 0:0:0:1077493890: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  [  226.902571] sd 0:0:0:1077493890: [sda] Attached SCSI disk
  [  287.117057] sd 0:0:0:1077493890: alua: evpd inquiry failed with 30000
  [  287.117062] sd 0:0:0:1077493890: alua: Attach failed (-22)
  [  287.117064] sd 0:0:0:1077493890: failed to add device handler: -22
  [  287.147303] scsi 0:0:0:0: Unexpected response from lun 1077493890 while scanning, scan aborted

  As second step, I additionally enable CHPID 196b:
  ~ # cat /proc/scsi/scsi 
  Attached devices:
  Host: scsi0 Channel: 00 Id: 00 Lun: 1077493890
    Vendor: IBM      Model: 2107900          Rev: 1.27
    Type:   Direct-Access                    ANSI  SCSI revision: 05
  Host: scsi1 Channel: 00 Id: 00 Lun: 1077493890
    Vendor: IBM      Model: 2107900          Rev: 1.27
    Type:   Direct-Access                    ANSI  SCSI revision: 05

  dmesg showing:
  [  384.277394] scsi host1: zfcp
  [  384.286516] qdio: 0.0.196b ZFCP on SC 10 using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W A 
  [  385.377511] scsi 1:0:0:1077493890: Direct-Access     IBM      2107900          1.27 PQ: 0 ANSI: 5
  [  385.378120] sd 1:0:0:1077493890: alua: supports implicit TPGS
  [  385.378781] sd 1:0:0:1077493890: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [  385.380097] sd 1:0:0:1077493890: [sdb] Write Protect is off
  [  385.380099] sd 1:0:0:1077493890: [sdb] Mode Sense: ed 00 00 08
  [  385.380408] sd 1:0:0:1077493890: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  [  385.384969] sd 1:0:0:1077493890: [sdb] Attached SCSI disk
  [  446.117041] sd 1:0:0:1077493890: alua: evpd inquiry failed with 30000
  [  446.117046] sd 1:0:0:1077493890: alua: Attach failed (-22)
  [  446.117048] sd 1:0:0:1077493890: failed to add device handler: -22
  [  446.147158] scsi 1:0:0:0: Unexpected response from lun 1077493890 while scanning, scan aborted

  Next, the installer warns that no disk drives were detected.

  I checked this on a system (installer 445) with two NPIV enabled FCP
  devices that have 4 LUNS each. I only get the first of four LUNs per
  adaptor:

  Apr  6 12:57:08 anna[2607]: DEBUG: retrieving zipl-installer 0.0.33ubuntu2
  Apr  6 12:57:08 anna[2607]: 2016-04-06 12:57:08 URL:http://ports.ubuntu.com//pool/main/z/zipl-installer/zipl-installer_0.0.33ubuntu2_s390x.udeb [4994/4994] -> "/var/cache/anna/zipl-installer_0.0.33ubuntu2_s390x.udeb" [1]
  Apr  6 12:57:09 main-menu[384]: INFO: Menu item 'driver-injection-disk-detect' selected
  Apr  6 12:57:10 main-menu[384]: INFO: Menu item 'user-setup-udeb' selected
  Apr  6 12:57:10 main-menu[384]: INFO: Menu item 'clock-setup' selected
  Apr  6 12:57:10 main-menu[384]: INFO: Menu item 's390-dasd' selected
  Apr  6 12:57:10 s390-dasd[6877]: INFO: s390-dasd: no channel found
  Apr  6 12:57:10 main-menu[384]: INFO: Menu item 's390-zfcp' selected
  Apr  6 12:57:10 s390-zfcp[6891]: DEBUG: DETECT: The zfcp device driver is not loaded
  Apr  6 12:57:10 s390-zfcp[6891]: WARNING **: Could not open directory: /sys/bus/ccw/drivers/zfcp/: No such file or directory
  Apr  6 12:57:11 main-menu[384]: INFO: Menu item 'disk-detect' selected
  Apr  6 12:57:11 anna-install: Installing mmc-modules
  Apr  6 12:57:11 disk-detect: insmod /lib/modules/4.4.0-17-generic/kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko 
  Apr  6 12:57:11 disk-detect: insmod /lib/modules/4.4.0-17-generic/kernel/drivers/scsi/device_handler/scsi_dh_alua.ko 
  Apr  6 12:57:11 disk-detect: insmod /lib/modules/4.4.0-17-generic/kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko 
  Apr  6 12:57:11 disk-detect: insmod /lib/modules/4.4.0-17-generic/kernel/drivers/scsi/device_handler/scsi_dh_emc.ko 
  Apr  6 12:57:11 kernel: [   40.509881] rdac: device handler registered
  Apr  6 12:57:11 kernel: [   40.513219] alua: device handler registered
  Apr  6 12:57:11 kernel: [   40.515517] hp_sw: device handler registered
  Apr  6 12:57:11 kernel: [   40.517572] emc: device handler registered
  Apr  6 12:57:11 net/hw-detect.hotplug: Detected hotpluggable network interface encf5f0
  Apr  6 12:57:11 net/hw-detect.hotplug: Detected hotpluggable network interface lo
  Apr  6 12:57:11 apt-install: Queueing package udev for later installation
  Apr  6 12:57:11 apt-install: Queueing package pciutils for later installation
  Apr  6 12:57:12 check-missing-firmware: looking at dmesg for the first time
  Apr  6 12:57:12 check-missing-firmware: saving timestamp for a later use: [   40.517572]
  Apr  6 12:57:12 check-missing-firmware: /dev/.udev/firmware-missing does not exist, skipping
  Apr  6 12:57:12 check-missing-firmware: /run/udev/firmware-missing does not exist, skipping
  Apr  6 12:57:12 check-missing-firmware: no missing firmware in loaded kernel modules
  Apr  6 12:58:29 main-menu[384]: (process:6908): unknown udeb mmc-modules
  Apr  6 12:58:29 main-menu[384]: INFO: Menu item 'disk-detect' succeeded but requested to be left unconfigured.
  Apr  6 12:58:31 main-menu[384]: INFO: Menu item 's390-zfcp' selected
  Apr  6 12:58:31 s390-zfcp[7752]: DEBUG: DETECT: Added FCP device: 0.0.1904: online=0 npiv=0
  Apr  6 12:58:31 s390-zfcp[7752]: DEBUG: DETECT: Added FCP device: 0.0.1944: online=0 npiv=0
  Apr  6 12:58:31 s390-zfcp[7752]: DEBUG: DETECT: Automatic LUN scanning is enabled
  Apr  6 12:58:31 s390-zfcp[7752]: DEBUG: PRESEED: No preseed data available
  Apr  6 12:58:40 s390-zfcp[7752]: DEBUG: SELECT: Using FCP device 0.0.1904
  Apr  6 12:58:40 kernel: [  129.591146] scsi host0: scsi_eh_0: sleeping
  Apr  6 12:58:40 kernel: [  129.591163] scsi host0: zfcp
  Apr  6 12:58:40 kernel: [  129.598187] qdio: 0.0.1904 ZFCP on SC f using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W A 
  Apr  6 12:58:41 s390-zfcp[7752]: DEBUG: ENABLE: Activated FCP device 0.0.1904 (npiv=1)
  Apr  6 12:58:41 kernel: [  130.873257] scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36
  Apr  6 12:58:41 kernel: [  130.873581] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:41 kernel: [  130.873593] scsi 0:0:0:0: scsi scan: INQUIRY pass 2 length 164
  Apr  6 12:58:41 kernel: [  130.873802] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:41 kernel: [  130.873807] scsi 0:0:0:0: scsi scan: peripheral device type of 31, no device added
  Apr  6 12:58:41 kernel: [  130.874148] scsi 0:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
  Apr  6 12:58:41 kernel: [  130.875003] scsi 0:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
  Apr  6 12:58:41 kernel: [  130.875005] scsi 0:0:0:0: scsi scan: REPORT LUN scan
  Apr  6 12:58:41 kernel: [  130.875176] scsi 0:0:0:1074937986: scsi scan: INQUIRY pass 1 length 36
  Apr  6 12:58:41 kernel: [  130.875395] scsi 0:0:0:1074937986: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:41 kernel: [  130.875399] scsi 0:0:0:1074937986: scsi scan: INQUIRY pass 2 length 164
  Apr  6 12:58:41 kernel: [  130.875608] scsi 0:0:0:1074937986: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:41 kernel: [  130.875613] scsi 0:0:0:1074937986: Direct-Access     IBM      2107900          1.27 PQ: 0 ANSI: 5
  Apr  6 12:58:41 kernel: [  130.876181] sd 0:0:0:1074937986: alua: supports implicit TPGS
  Apr  6 12:58:41 kernel: [  130.876323] sd 0:0:0:1074937986: tag#1 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
  Apr  6 12:58:41 kernel: [  130.876326] sd 0:0:0:1074937986: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00
  Apr  6 12:58:41 kernel: [  130.876328] sd 0:0:0:1074937986: tag#1 Sense Key : Unit Attention [current] 
  Apr  6 12:58:41 kernel: [  130.876333] sd 0:0:0:1074937986: tag#1 Add. Sense: Power on, reset, or bus device reset occurred
  Apr  6 12:58:41 kernel: [  130.876722] sd 0:0:0:1074937986: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  Apr  6 12:58:41 kernel: [  130.877759] sd 0:0:0:1074937986: [sda] Write Protect is off
  Apr  6 12:58:41 kernel: [  130.877761] sd 0:0:0:1074937986: [sda] Mode Sense: ed 00 00 08
  Apr  6 12:58:41 kernel: [  130.878085] sd 0:0:0:1074937986: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  Apr  6 12:58:41 kernel: [  130.878295] sd 0:0:0:1074937986: tag#1 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK
  Apr  6 12:58:41 kernel: [  130.878297] sd 0:0:0:1074937986: tag#1 CDB: Report supported operation codes a3 0c 01 12 00 00 00 00 02 00 00 00
  Apr  6 12:58:41 kernel: [  130.878298] sd 0:0:0:1074937986: tag#1 Sense Key : Illegal Request [current] 
  Apr  6 12:58:41 kernel: [  130.878300] sd 0:0:0:1074937986: tag#1 Add. Sense: Invalid field in cdb
  Apr  6 12:58:41 kernel: [  130.882710] sd 0:0:0:1074937986: [sda] Attached SCSI disk
  Apr  6 12:58:43 s390-zfcp[7752]: DEBUG: POPULATE LUN TREE: 0.0.1904:0x50050763070845e3:0x4082401200000000 [0:0:0:1074937986]
  Apr  6 12:58:43 s390-zfcp[7752]: DEBUG: WRITE CONFIG: /etc/sysconfig/hardware/config-ccw-0.0.1904
  Apr  6 12:58:47 s390-zfcp[7752]: DEBUG: SELECT: Using FCP device 0.0.1944
  Apr  6 12:58:47 kernel: [  136.279385] scsi host1: scsi_eh_1: sleeping
  Apr  6 12:58:47 kernel: [  136.279396] scsi host1: zfcp
  Apr  6 12:58:47 kernel: [  136.290784] qdio: 0.0.1944 ZFCP on SC 10 using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W A 
  Apr  6 12:58:52 s390-zfcp[7752]: DEBUG: ENABLE: Activated FCP device 0.0.1944 (npiv=1)
  Apr  6 12:58:52 kernel: [  141.278081] scsi 1:0:0:0: scsi scan: INQUIRY pass 1 length 36
  Apr  6 12:58:52 kernel: [  141.278691] scsi 1:0:0:0: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:52 kernel: [  141.278698] scsi 1:0:0:0: scsi scan: INQUIRY pass 2 length 164
  Apr  6 12:58:52 kernel: [  141.278910] scsi 1:0:0:0: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:52 kernel: [  141.278915] scsi 1:0:0:0: scsi scan: peripheral device type of 31, no device added
  Apr  6 12:58:52 kernel: [  141.279221] scsi 1:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
  Apr  6 12:58:52 kernel: [  141.280115] scsi 1:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
  Apr  6 12:58:52 kernel: [  141.280119] scsi 1:0:0:0: scsi scan: REPORT LUN scan
  Apr  6 12:58:52 kernel: [  141.280296] scsi 1:0:0:1074937986: scsi scan: INQUIRY pass 1 length 36
  Apr  6 12:58:52 kernel: [  141.280507] scsi 1:0:0:1074937986: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:52 kernel: [  141.280511] scsi 1:0:0:1074937986: scsi scan: INQUIRY pass 2 length 164
  Apr  6 12:58:52 kernel: [  141.280693] scsi 1:0:0:1074937986: scsi scan: INQUIRY successful with code 0x0
  Apr  6 12:58:52 kernel: [  141.280698] scsi 1:0:0:1074937986: Direct-Access     IBM      2107900          1.27 PQ: 0 ANSI: 5
  Apr  6 12:58:52 kernel: [  141.281519] sd 1:0:0:1074937986: alua: supports implicit TPGS
  Apr  6 12:58:52 kernel: [  141.281798] sd 1:0:0:1074937986: tag#1 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
  Apr  6 12:58:52 kernel: [  141.281801] sd 1:0:0:1074937986: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00
  Apr  6 12:58:52 kernel: [  141.281802] sd 1:0:0:1074937986: tag#1 Sense Key : Unit Attention [current] 
  Apr  6 12:58:52 kernel: [  141.281804] sd 1:0:0:1074937986: tag#1 Add. Sense: Power on, reset, or bus device reset occurred
  Apr  6 12:58:52 kernel: [  141.282130] sd 1:0:0:1074937986: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  Apr  6 12:58:52 kernel: [  141.283353] sd 1:0:0:1074937986: [sdb] Write Protect is off
  Apr  6 12:58:52 kernel: [  141.283355] sd 1:0:0:1074937986: [sdb] Mode Sense: ed 00 00 08
  Apr  6 12:58:52 kernel: [  141.283637] sd 1:0:0:1074937986: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  Apr  6 12:58:52 kernel: [  141.283758] sd 1:0:0:1074937986: tag#1 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK
  Apr  6 12:58:52 kernel: [  141.283760] sd 1:0:0:1074937986: tag#1 CDB: Report supported operation codes a3 0c 01 12 00 00 00 00 02 00 00 00
  Apr  6 12:58:52 kernel: [  141.283761] sd 1:0:0:1074937986: tag#1 Sense Key : Illegal Request [current] 
  Apr  6 12:58:52 kernel: [  141.283763] sd 1:0:0:1074937986: tag#1 Add. Sense: Invalid field in cdb
  Apr  6 12:58:52 kernel: [  141.288209] sd 1:0:0:1074937986: [sdb] Attached SCSI disk
  Apr  6 12:58:53 s390-zfcp[7752]: DEBUG: POPULATE LUN TREE: 0.0.1944:0x50050763071845e3:0x4082401200000000 [1:0:0:1074937986]
  Apr  6 12:58:53 s390-zfcp[7752]: DEBUG: WRITE CONFIG: /etc/sysconfig/hardware/config-ccw-0.0.1944

  
  It looks like the LUN scan is being terminated.
  Kernel parms with: scsi_mod.scsi_logging_level=4605 zfcp.dbflevel=6 zfcp.dbfsize=100

  Re-testing with installer version 445 + udeb packages from 2016-04-07
  s390-dasd 0.0.36ubuntu1, s390-zfcp 1.0.2ubuntu1, multipath-udeb
  0.5.0+git1.656f8865-5ubuntu2, disk-detect 1.117ubuntu1

  Some observations:
  1. With one path only, the scanning/detection of all LUNs worked deterministically and correctly (including the final re-IPL after successful installation).
  2. With two paths, but without the parmfile option 'disk-detect/multipath/enable=true', the LUNs were all detected, but once for every path. I did not want to continue installation that way.
  3. With two paths and with the option 'disk-detect/multipath/enable=true', the multipath program was loaded and multipathing was set up correctly. ...
  4. ... installation completed fine then...
  5. .... but the final re-IPL after successful installation failed. Attaching last console messages of the IPL: 
  6. I could also see a difference in filesystem creation time for multipath and single path: 

  with multipathing (2 paths)
  Apr  7 15:27:33 partman: mke2fs 1.42.13 (17-May-2015)
  Apr  7 15:28:34 kernel: [  559.624851] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: errors=remount-ro

  without multipathing
  Apr  7 15:48:17 partman: mke2fs 1.42.13 (17-May-2015)
  Apr  7 15:48:50 kernel: [  413.789054] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: errors=remount-ro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1567602/+subscriptions



More information about the foundations-bugs mailing list