[Bug 1778844] Re: nvme multipath does not report path relationships

Thadeu Lima de Souza Cascardo 1778844 at bugs.launchpad.net
Mon Apr 22 15:03:42 UTC 2019


@vorlon, test case updated.

Thanks.
Cascardo.

** Description changed:

  [Impact]
  initramfs created with MODULES=dep or kdump initrd won't boot a system with root filesystem on a multipath nvme.
  
  [Test case]
  Systems with nvme multipath were able to boot with the created initramfs. Also tested on systems with non-multipath nvme, and non nvme systems.
  
+ In order to verify the fix, one needs to change MODULES option to dep on
+ /etc/initramfs-tools/initramfs.conf, recreate initramfs and reboot,
+ check the system has booted fine. That should not break on systems with
+ non nvme disks or systems with non multipath nvme systems, and that
+ should now work on multipath nvme systems.
+ 
+ sed -i /MODULES=/s,=.*,=dep, /etc/initramfs-tools/initramfs.conf
+ update-initramfs -u -k all
+ reboot
+ 
  [Regression potential]
  A system could fail to boot because the generated initramfs was broken. The code should just add modules, which is safer than removing modules or doing any other changes. In any case, it was tested to boot on multipath nvme, non multipath nvme and non nvme systems.
  
- 
  -----------------
- 
  
  Problem Description:
  ===================
  After triggering crash ,kdump is not working & system enters into initramfs state
  
  Steps to re-create:
  ==================
  
  >. woo is installed ubuntu180401 kernel
  
  root at woo:~# uname -a
  Linux woo 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 17:59:00 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
  root at woo:~#
  
  >. Crashkernel value as below
  
  root at woo:~# free -h
                total        used        free      shared  buff/cache   available
  Mem:           503G        2.0G        501G         13M        279M        499G
  Swap:          2.0G          0B        2.0G
  
  root at woo:~# cat /proc/cmdline
  root=UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb ro splash quiet crashkernel=8192M
  
  >  kdump status
  
  root at woo:~#  kdump-config status
  current state   : ready to kdump
  
  root at woo:~#  kdump-config show
  DUMP_MODE:        kdump
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr:
     /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.15.0-23-generic
  kdump initrd:
     /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-23-generic
  current state:    ready to kdump
  
  kexec command:
    /sbin/kexec -p --command-line="root=UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb ro splash quiet nr_cpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
  
  root at woo:~# dmesg | grep Reser
  [    0.000000] Reserving 8192MB of memory at 128MB for crashkernel (System RAM: 524288MB)
  [    0.000000] cma: Reserved 26224 MiB at 0x0000203995000000
  [    3.545490] Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  
  > Triggered crash
  
  root at woo:~# echo 1 > /proc/sys/kernel/sysrq
  root at woo:~# echo c > /proc/sysrq-trigger
  [   73.056308] sysrq: SysRq : Trigger a crash
  [   73.056357] Unable to handle kernel paging request for data at address 0x00000000
  [   73.056459] Faulting instruction address: 0xc0000000007f24c8
  [   73.056543] Oops: Kernel access of bad area, sig: 11 [#1]
  [   73.056609] LE SMP NR_CPUS=2048 NUMA PowerNV
  [   73.056668] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) esp6_offload esp6 esp4_offload esp4 xfrm_algo mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc idt_89hpesx vmx_crypto crct10dif_vpmsum ofpart cmdlinepart ipmi_powernv ipmi_devintf at24 powernv_flash ipmi_msghandler ibmpowernv mtd opal_prd uio_pdrv_genirq uio nfsd auth_rpcgss nfs_acl lockd sch_fq_codel grace sunrpc knem(OE) ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq mlx5_ib(OE) ib_core(OE) nouveau lpfc ast i2c_algo_bit ttm mlx5_core(OE) drm_kms_helper mlxfw(OE) nvmet_fc devlink syscopyarea nvmet mlx_compat(OE) sysfillrect cxl nvme_fc sysimgblt fb_sys_fops nvme_fabrics nvme ahci crc32c_vpmsum drm scsi_transport_fc
  [   73.057601]  tg3 libahci nvme_core pnv_php
  [   73.057652] CPU: 44 PID: 4626 Comm: bash Tainted: G           OE    4.15.0-23-generic #25-Ubuntu
  [   73.057767] NIP:  c0000000007f24c8 LR: c0000000007f3568 CTR: c0000000007f24a0
  [   73.057868] REGS: c000003f8269f9f0 TRAP: 0300   Tainted: G           OE     (4.15.0-23-generic)
  [   73.057986] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28222222  XER: 20040000
  [   73.058099] CFAR: c0000000007f3564 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
  [   73.058099] GPR00: c0000000007f3568 c000003f8269fc70 c0000000016eaf00 0000000000000063
  [   73.058099] GPR04: c000003fef47ce18 c000003fef494368 9000000000009033 0000000031da0058
  [   73.058099] GPR08: 0000000000000007 0000000000000001 0000000000000000 9000000000001003
  [   73.058099] GPR12: c0000000007f24a0 c000000007a2e400 00000e4fa497c900 0000000000000000
  [   73.058099] GPR16: 00000e4f79cc94b0 00000e4f79d567e0 00000e4f79d88204 00000e4f79d56818
  [   73.058099] GPR20: 00000e4f79d8d5d8 0000000000000001 0000000000000000 00007ffffefce644
  [   73.058099] GPR24: 00007ffffefce640 00000e4f79d8afb4 c0000000015e9aa8 0000000000000002
  [   73.058099] GPR28: 0000000000000063 0000000000000004 c000000001572b1c c0000000015e9e68
  [   73.059060] NIP [c0000000007f24c8] sysrq_handle_crash+0x28/0x30
  [   73.059142] LR [c0000000007f3568] __handle_sysrq+0xf8/0x2c0
  [   73.059215] Call Trace:
  [   73.059254] [c000003f8269fc70] [c0000000007f3548] __handle_sysrq+0xd8/0x2c0 (unreliable)
  [   73.059358] [c000003f8269fd10] [c0000000007f3d74] write_sysrq_trigger+0x64/0x90
  [   73.059456] [c000003f8269fd40] [c000000000481248] proc_reg_write+0x88/0xd0
  [   73.059543] [c000003f8269fd70] [c0000000003d43fc] __vfs_write+0x3c/0x70
  [   73.059627] [c000003f8269fd90] [c0000000003d4658] vfs_write+0xd8/0x220
  [   73.059716] [c000003f8269fde0] [c0000000003d4978] SyS_write+0x68/0x110
  [   73.059809] [c000003f8269fe30] [c00000000000b284] system_call+0x58/0x6c
  [   73.059896] Instruction dump:
  [   73.059940] 4bfff9f1 4bfffe50 3c4c00f0 38428a60 7c0802a6 60000000 39200001 3d42001c
  [   73.060040] 394a6db0 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00f0 38428a30
  [   73.060159] ---[ end trace e116d2421d2f59a5 ]---
  [   74.067059]
  [   74.067172] Sending IPI to other CPUs
  [   75.851509[  202.275317797,5] OPAL: Switch to big-endian OS
  ] IPI complet[  207.151277658,5] OPAL: Switch to little-endian OS
  [  232.159296542,3] PHB#0033[8:3]: CRESET: Unexpected slot state 00000102, resetting...
  e
  [   78.164472] kexec: Starting switchover sequence.
  [    1.412463] integrity: Unable to open file: /etc/keys/x509_ima.der (-2)
  [    1.412468] integrity: Unable to open file: /etc/keys/x509_evm.der (-2)
  [    1.481335] vio vio: uevent: failed to send synthetic uevent
  [    2.534732] nouveau 0004:04:00.0: unknown chipset (140000a1)
  [    2.534847] nouveau 0004:05:00.0: unknown chipset (140000a1)
  [    2.534967] nouveau 0035:03:00.0: unknown chipset (140000a1)
  [    2.535144] nouveau 0035:04:00.0: unknown chipset (140000a1)
  
  Gave up waiting for root file system device.  Common problems:
   - Boot args (cat /proc/cmdline)
     - Check rootdelay= (did the system wait long enough?)
   - Missing modules (cat /proc/modules; ls /dev)
  ALERT!  UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb does not exist.  Dropping to a shell!
  
  BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3) built-in shell (ash)
  Enter 'help' for a list of built-in commands.
  
  (initramfs)
  (initramfs)
  
  == Comment: #1 - INDIRA P. JOGA <> - 2018-06-21 01:19:41 ==
  Attached woo console logs for kdump issue
  
  == Comment: #4 - INDIRA P. JOGA <> - 2018-06-26 01:51:47 ==
  I have triggered crash & sits here
  
  [ 1032.259696471,3] PHB#0033[8:3]: CRESET: Unexpected slot state 00000102, resetting...
  omplete
  [  823.882048] kexec: Starting switchover sequence.
  [    1.154056] integrity: Unable to open file: /etc/keys/x509_ima.der (-2)
  [    1.154060] integrity: Unable to open file: /etc/keys/x509_evm.der (-2)
  [    1.222719] vio vio: uevent: failed to send synthetic uevent
  [    2.212065] nouveau 0004:04:00.0: unknown chipset (140000a1)
  [    2.214995] nouveau 0004:05:00.0: unknown chipset (140000a1)
  [    2.215259] nouveau 0035:03:00.0: unknown chipset (140000a1)
  [    2.215408] nouveau 0035:04:00.0: unknown chipset (140000a1)
  Gave up waiting for root file system device.  Common problems:
   - Boot args (cat /proc/cmdline)
     - Check rootdelay= (did the system wait long enough?)
   - Missing modules (cat /proc/modules; ls /dev)
  ALERT!  UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb does not exist.  Dropping to a shell!
  
  BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3) built-in shell (ash)
  Enter 'help' for a list of built-in commands.
  
  (initramfs)
  
  == Comment: #5 - INDIRA P. JOGA <> - 2018-06-26 02:40:56 ==
  NOTE:
  
  Used nvme disk as root disk here.
  
  == Comment: #8 - Hari Krishna Bathini <> - 2018-06-26 06:06:13 ==
  The dump target (/var/crash) is on NVMe device (also, the root disk).
  But the kdump initrd is not being built with nvme driver modules.
  Eventually, nvme disk is not found and kdump kernel is hitting the
  initramfs shell. Using the default initrd, which has the nvme driver
  modules included, dump was captured successfully.
  
  Can someone from Canonical take a look at this and comment on why
  nvme modules are not included in kdump initrd despite it being the
  root disk..
  
  Thanks
  Hari

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1778844

Title:
  nvme multipath does not report path relationships

Status in The Ubuntu-power-systems project:
  In Progress
Status in initramfs-tools package in Ubuntu:
  Fix Released
Status in initramfs-tools source package in Bionic:
  Incomplete
Status in initramfs-tools source package in Cosmic:
  In Progress
Status in initramfs-tools source package in Disco:
  Fix Released

Bug description:
  [Impact]
  initramfs created with MODULES=dep or kdump initrd won't boot a system with root filesystem on a multipath nvme.

  [Test case]
  Systems with nvme multipath were able to boot with the created initramfs. Also tested on systems with non-multipath nvme, and non nvme systems.

  In order to verify the fix, one needs to change MODULES option to dep
  on /etc/initramfs-tools/initramfs.conf, recreate initramfs and reboot,
  check the system has booted fine. That should not break on systems
  with non nvme disks or systems with non multipath nvme systems, and
  that should now work on multipath nvme systems.

  sed -i /MODULES=/s,=.*,=dep, /etc/initramfs-tools/initramfs.conf
  update-initramfs -u -k all
  reboot

  [Regression potential]
  A system could fail to boot because the generated initramfs was broken. The code should just add modules, which is safer than removing modules or doing any other changes. In any case, it was tested to boot on multipath nvme, non multipath nvme and non nvme systems.

  -----------------

  Problem Description:
  ===================
  After triggering crash ,kdump is not working & system enters into initramfs state

  Steps to re-create:
  ==================

  >. woo is installed ubuntu180401 kernel

  root at woo:~# uname -a
  Linux woo 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 17:59:00 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
  root at woo:~#

  >. Crashkernel value as below

  root at woo:~# free -h
                total        used        free      shared  buff/cache   available
  Mem:           503G        2.0G        501G         13M        279M        499G
  Swap:          2.0G          0B        2.0G

  root at woo:~# cat /proc/cmdline
  root=UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb ro splash quiet crashkernel=8192M

  >  kdump status

  root at woo:~#  kdump-config status
  current state   : ready to kdump

  root at woo:~#  kdump-config show
  DUMP_MODE:        kdump
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr:
     /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.15.0-23-generic
  kdump initrd:
     /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-23-generic
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --command-line="root=UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb ro splash quiet nr_cpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

  root at woo:~# dmesg | grep Reser
  [    0.000000] Reserving 8192MB of memory at 128MB for crashkernel (System RAM: 524288MB)
  [    0.000000] cma: Reserved 26224 MiB at 0x0000203995000000
  [    3.545490] Copyright (C) 2017-2018 Broadcom. All Rights Reserved. The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.

  > Triggered crash

  root at woo:~# echo 1 > /proc/sys/kernel/sysrq
  root at woo:~# echo c > /proc/sysrq-trigger
  [   73.056308] sysrq: SysRq : Trigger a crash
  [   73.056357] Unable to handle kernel paging request for data at address 0x00000000
  [   73.056459] Faulting instruction address: 0xc0000000007f24c8
  [   73.056543] Oops: Kernel access of bad area, sig: 11 [#1]
  [   73.056609] LE SMP NR_CPUS=2048 NUMA PowerNV
  [   73.056668] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) esp6_offload esp6 esp4_offload esp4 xfrm_algo mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc idt_89hpesx vmx_crypto crct10dif_vpmsum ofpart cmdlinepart ipmi_powernv ipmi_devintf at24 powernv_flash ipmi_msghandler ibmpowernv mtd opal_prd uio_pdrv_genirq uio nfsd auth_rpcgss nfs_acl lockd sch_fq_codel grace sunrpc knem(OE) ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq mlx5_ib(OE) ib_core(OE) nouveau lpfc ast i2c_algo_bit ttm mlx5_core(OE) drm_kms_helper mlxfw(OE) nvmet_fc devlink syscopyarea nvmet mlx_compat(OE) sysfillrect cxl nvme_fc sysimgblt fb_sys_fops nvme_fabrics nvme ahci crc32c_vpmsum drm scsi_transport_fc
  [   73.057601]  tg3 libahci nvme_core pnv_php
  [   73.057652] CPU: 44 PID: 4626 Comm: bash Tainted: G           OE    4.15.0-23-generic #25-Ubuntu
  [   73.057767] NIP:  c0000000007f24c8 LR: c0000000007f3568 CTR: c0000000007f24a0
  [   73.057868] REGS: c000003f8269f9f0 TRAP: 0300   Tainted: G           OE     (4.15.0-23-generic)
  [   73.057986] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28222222  XER: 20040000
  [   73.058099] CFAR: c0000000007f3564 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
  [   73.058099] GPR00: c0000000007f3568 c000003f8269fc70 c0000000016eaf00 0000000000000063
  [   73.058099] GPR04: c000003fef47ce18 c000003fef494368 9000000000009033 0000000031da0058
  [   73.058099] GPR08: 0000000000000007 0000000000000001 0000000000000000 9000000000001003
  [   73.058099] GPR12: c0000000007f24a0 c000000007a2e400 00000e4fa497c900 0000000000000000
  [   73.058099] GPR16: 00000e4f79cc94b0 00000e4f79d567e0 00000e4f79d88204 00000e4f79d56818
  [   73.058099] GPR20: 00000e4f79d8d5d8 0000000000000001 0000000000000000 00007ffffefce644
  [   73.058099] GPR24: 00007ffffefce640 00000e4f79d8afb4 c0000000015e9aa8 0000000000000002
  [   73.058099] GPR28: 0000000000000063 0000000000000004 c000000001572b1c c0000000015e9e68
  [   73.059060] NIP [c0000000007f24c8] sysrq_handle_crash+0x28/0x30
  [   73.059142] LR [c0000000007f3568] __handle_sysrq+0xf8/0x2c0
  [   73.059215] Call Trace:
  [   73.059254] [c000003f8269fc70] [c0000000007f3548] __handle_sysrq+0xd8/0x2c0 (unreliable)
  [   73.059358] [c000003f8269fd10] [c0000000007f3d74] write_sysrq_trigger+0x64/0x90
  [   73.059456] [c000003f8269fd40] [c000000000481248] proc_reg_write+0x88/0xd0
  [   73.059543] [c000003f8269fd70] [c0000000003d43fc] __vfs_write+0x3c/0x70
  [   73.059627] [c000003f8269fd90] [c0000000003d4658] vfs_write+0xd8/0x220
  [   73.059716] [c000003f8269fde0] [c0000000003d4978] SyS_write+0x68/0x110
  [   73.059809] [c000003f8269fe30] [c00000000000b284] system_call+0x58/0x6c
  [   73.059896] Instruction dump:
  [   73.059940] 4bfff9f1 4bfffe50 3c4c00f0 38428a60 7c0802a6 60000000 39200001 3d42001c
  [   73.060040] 394a6db0 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00f0 38428a30
  [   73.060159] ---[ end trace e116d2421d2f59a5 ]---
  [   74.067059]
  [   74.067172] Sending IPI to other CPUs
  [   75.851509[  202.275317797,5] OPAL: Switch to big-endian OS
  ] IPI complet[  207.151277658,5] OPAL: Switch to little-endian OS
  [  232.159296542,3] PHB#0033[8:3]: CRESET: Unexpected slot state 00000102, resetting...
  e
  [   78.164472] kexec: Starting switchover sequence.
  [    1.412463] integrity: Unable to open file: /etc/keys/x509_ima.der (-2)
  [    1.412468] integrity: Unable to open file: /etc/keys/x509_evm.der (-2)
  [    1.481335] vio vio: uevent: failed to send synthetic uevent
  [    2.534732] nouveau 0004:04:00.0: unknown chipset (140000a1)
  [    2.534847] nouveau 0004:05:00.0: unknown chipset (140000a1)
  [    2.534967] nouveau 0035:03:00.0: unknown chipset (140000a1)
  [    2.535144] nouveau 0035:04:00.0: unknown chipset (140000a1)

  Gave up waiting for root file system device.  Common problems:
   - Boot args (cat /proc/cmdline)
     - Check rootdelay= (did the system wait long enough?)
   - Missing modules (cat /proc/modules; ls /dev)
  ALERT!  UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb does not exist.  Dropping to a shell!

  BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3) built-in shell (ash)
  Enter 'help' for a list of built-in commands.

  (initramfs)
  (initramfs)

  == Comment: #1 - INDIRA P. JOGA <> - 2018-06-21 01:19:41 ==
  Attached woo console logs for kdump issue

  == Comment: #4 - INDIRA P. JOGA <> - 2018-06-26 01:51:47 ==
  I have triggered crash & sits here

  [ 1032.259696471,3] PHB#0033[8:3]: CRESET: Unexpected slot state 00000102, resetting...
  omplete
  [  823.882048] kexec: Starting switchover sequence.
  [    1.154056] integrity: Unable to open file: /etc/keys/x509_ima.der (-2)
  [    1.154060] integrity: Unable to open file: /etc/keys/x509_evm.der (-2)
  [    1.222719] vio vio: uevent: failed to send synthetic uevent
  [    2.212065] nouveau 0004:04:00.0: unknown chipset (140000a1)
  [    2.214995] nouveau 0004:05:00.0: unknown chipset (140000a1)
  [    2.215259] nouveau 0035:03:00.0: unknown chipset (140000a1)
  [    2.215408] nouveau 0035:04:00.0: unknown chipset (140000a1)
  Gave up waiting for root file system device.  Common problems:
   - Boot args (cat /proc/cmdline)
     - Check rootdelay= (did the system wait long enough?)
   - Missing modules (cat /proc/modules; ls /dev)
  ALERT!  UUID=45bb7eb2-4c61-425d-8bf9-4e6f16829ddb does not exist.  Dropping to a shell!

  BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3) built-in shell (ash)
  Enter 'help' for a list of built-in commands.

  (initramfs)

  == Comment: #5 - INDIRA P. JOGA <> - 2018-06-26 02:40:56 ==
  NOTE:

  Used nvme disk as root disk here.

  == Comment: #8 - Hari Krishna Bathini <> - 2018-06-26 06:06:13 ==
  The dump target (/var/crash) is on NVMe device (also, the root disk).
  But the kdump initrd is not being built with nvme driver modules.
  Eventually, nvme disk is not found and kdump kernel is hitting the
  initramfs shell. Using the default initrd, which has the nvme driver
  modules included, dump was captured successfully.

  Can someone from Canonical take a look at this and comment on why
  nvme modules are not included in kdump initrd despite it being the
  root disk..

  Thanks
  Hari

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1778844/+subscriptions



More information about the Ubuntu-sponsors mailing list