[Bug 1658733] Re: Ubuntu 16.04.2KVM:kdump fails to mount root file system when noirqdistrib is missing as dump kernel parameter

Brian Murray brian at ubuntu.com
Thu Dec 14 21:05:24 UTC 2017


Hello bugproxy, or anyone else affected,

Accepted makedumpfile into artful-proposed. The package will build now
and be available at
https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.1-2ubuntu0.1 in
a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-artful to verification-done-artful. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-artful. In either case, without details of
your testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance!

** Changed in: makedumpfile (Ubuntu Artful)
       Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-artful

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1658733

Title:
  Ubuntu 16.04.2KVM:kdump fails to mount root file system when
  noirqdistrib is missing as dump kernel parameter

Status in The Ubuntu-power-systems project:
  Confirmed
Status in kexec-tools package in Ubuntu:
  Invalid
Status in makedumpfile package in Ubuntu:
  Fix Released
Status in kexec-tools source package in Trusty:
  New
Status in makedumpfile source package in Trusty:
  New
Status in kexec-tools source package in Xenial:
  New
Status in makedumpfile source package in Xenial:
  In Progress
Status in kexec-tools source package in Zesty:
  New
Status in makedumpfile source package in Zesty:
  New
Status in kexec-tools source package in Artful:
  Invalid
Status in makedumpfile source package in Artful:
  Fix Committed
Status in kexec-tools source package in Bionic:
  Invalid
Status in makedumpfile source package in Bionic:
  Fix Released

Bug description:
  [Impact]
  On Power Systems, some interrupts are missed, and dumping the crash will fail. Adding the noirqdistrib kernel parameter to the kdump kernel will fix this.

  [Test Case]
  Setting up kdump to target a virtio-scsi device on a Power System.

  [Regression Potential]
  The parameter could be interpreted differently on a different platform and kdump would fail. However, it has been verified that no other platform uses such parameter. If another parameter would have been incorrectly removed on the patch, kdump could fail on other systems.


  == Comment: #0 - Richard M. Scheller - 2016-12-14 16:50:26 ==

  ---Problem Description---

  On a KVM guest installed to a multipath root device, the kdump kernel
  fails to mount the root file system.  This error does not occur in a
  similar guest installed to a single path device.

  Full console output of the kdump failure is attached.  These messages
  from the output may be relevant:

  Begin: Loading multipath modules ... Success: loaded module dm-multipath.
  done.
  Begin: Loading multipath hardware handlers ... Failure: failed to load module sc
  si_dh_alua.
  Failure: failed to load module scsi_dh_rdac.
  Failure: failed to load module scsi_dh_emc.
  done.
  Begin: Starting multipathd ... done.

  ---uname output---
  Linux dotg9 4.8.0-32-generic #34~16.04.1-Ubuntu SMP Tue Dec 13 17:01:57 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = 8247-22L Ubuntu 16.04.1 KVM guest

  ---Steps to Reproduce---
   - Install Ubuntu 16.04.1 to a muiltpath target disk
  - Install kdump-tools package
  - Configure kexec-tools to reserve sufficient RAM for the kdump kernel to load (I use 512MB) in /etc/default/grub.d/kexec-tools.cfg
  - Run update-grub
  - Reboot
  - Initiate a system crash using "echo c > /proc/sysrq-trigger"

  == Comment: #12 - Richard M. Scheller - 2016-12-20 20:37:45 ==
  Here is the log level 8 kdump console log requested in comment 10.

  == Comment: #21 - Richard M. Scheller - 2017-01-06 11:04:17 ==
  (In reply to comment #19)
  > Hi, I logged in dotkvm and I couldn't find the guest dotg9. Also, although I
  > found a dotg9.xml in /kte/xml/ it doesn't look like it uses multipath (it
  > uses .img files which I didn't found as disks).
  >
  > Could you please recreate the guest for further debug?

  Yes, I recreated the guest with its correct multipath lun
  configuration.  I have also attached the guest XML to this bug.

  > Besides that could you please let us know:
  >  - is the multipath the system's root? I mean / is installed/mounted on the
  > multipath device?

  Yes, the guest has only one disk.  That disk is actually a LUN from a
  fiber channel storage device with two paths on the host side.  I have
  passed through both paths to the guest, so the multipath nature of the
  target disk is known to the guest.

  In other words, the guest sees a multipath device and is using it as a
  multipath device.  The root file system is called /dev/mapper/mpatha-
  part2 on the guest.

  >  - how did you attach the device to the guest?

  Each FC LUN path on the host is mapped to a virtio-scsi controller on
  the guest using LUN passthrough.  (See the guest XML for details on
  this.)

  == Comment: #22 - Mauro Sergio Martins Rodrigues  - 2017-01-11 09:31:38 ==
  I managed to get kdump to mount rootfs and perform its tasks by setting KDUMP_CMDLINE_APPEND="nr_cpus=4" parameter in /etc/default/kdump-tools see http://pastebin.hursley.ibm.com/8239

  I'm still investigating to figure out what is the reason behind this
  behavior.

  Thanks,

  --
  maurosr

  == Comment: #23 - Mauricio Faria De Oliveira  - 2017-01-11 11:56:40 ==
  Mauro,

  (In reply to comment #22)
  > I managed to get kdump to mount rootfs and perform its tasks by setting
  > KDUMP_CMDLINE_APPEND="nr_cpus=4" parameter in /etc/default/kdump-tools see
  > http://pastebin.hursley.ibm.com/8239
  >
  > I'm still investigating to figure out what is the reason behind this
  > behavior.
  >
  > Thanks,
  >
  > --
  > maurosr

  That would smell like an out of memory condition that is alleviated
  with a smaller number of CPUs allowed for the kernel (so the amount of
  memory associated with per-CPU stuff is less in total).

  Per the bug description, the memory reserved for the crashkernel is
  512MB:

  (In reply to comment #23)
  > - Configure kexec-tools to reserve sufficient RAM for the kdump kernel to
  > load (I use 512MB) in /etc/default/grub.d/kexec-tools.cfg

  That seems low for Power guests/systems.
  I think it theory is doesn't seem so, but the reality is that _for some reason(s)_ we require just too much memory to load and boot a kernel/initramfs (either on boot or kdump).

  When working w/ kdump and Ubuntu, I usually set the crashkernel
  allocated size right away to 4GB to avoid problems.

  Since this is a smaller sized guest, obviously we'd want to use less
  than that, but more than 512 MB given the evidence observed.

  Hope this helps

  == Comment: #28 - Mauro Sergio Martins Rodrigues - 2017-01-13 10:12:28
  ==

  >I think it theory is doesn't seem so, but the reality is that _for
  some reason(s)_ we require just too >much memory to load and boot a
  kernel/initramfs (either on boot or kdump).

  For the record, as you already know, I've raise memory up to 1024mb
  and it didn't help.

  >Per yesterday's conversations, this had to do with IRQ distribution
  and the nr_cpus kernel parameter, >and seemed to affect multipath only
  by chance, usually failing/hanging the guest at kdump at other >parts
  / way earlier in boot (at virtio-scsi disk probe phase too).

  Yes, that's right. So looks like there are a couple of things going on
  here. The first and simpler:

  According to kdump's documentation at
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/kdump/kdump.txt#n365
  noirqdistrib is a necessary parameter for use kdump in ppc64
  architecture, and indeed it solves the issue, including the case when
  nr_cpus=1 (which was failing in all my attempts until I used
  noirqdistrib).

  So I believe a patch in ubuntu's kdump package to set that attribute
  for ppc builds may solve this definitely and that will be my focus for
  right now.

  Nevertheless I will keep investigating why this issue is happening
  only w/ multipath devices.

  == Comment: #29 - Mauricio Faria De Oliveira - 2017-01-16 04:22:00 ==
  (In reply to comment #28)
  > >I think it theory is doesn't seem so, but the reality is that _for some reason(s)_ we require just too >much memory to load and boot a kernel/initramfs (either on boot or kdump).
  >
  > For the record, as you already know, I've raise memory up to 1024mb and it
  > didn't help.

  Definitely.
  What I've observed is that more than 2GB (yes..) was required on some systems I checked on at the time.
  Since you've identified the IRQ distribution aspect of this issue, the crashkernel memory size might not be completely related to this problem, and for this system, the configured sizes happen to work well.

  > Nevertheless I will keep investigating why this issue is happening only w/
  > multipath devices.

  Based on the IRQ distribution aspect, the most reasonable suspicion
  I've thought of is...

  In our testing, several times I observed the kernel initialization to
  hang in the probe stage of the virtio-scsi disks, and the IO request
  (probably for the partition table read operation) would time out
  (signaled by a 'tag abort' message).

  If we suppose that these initial IO requests passed correctly (say,
  these initial IRQs happened to be assigned the the CPU that was
  online, and thus were delivered/handled correctly)  BUT the IO
  requests issued by multipath (for disk/path identification)  fail
  (i.e., happened to be assigned to a CPU that was offline,  then these
  requests would time out), then multipath would never get a response
  back, thus not initializing the individual paths and the respective
  multipath device.

  So the /dev/mapper/mpathX device is not created, and the problem is
  observed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1658733/+subscriptions



More information about the Ubuntu-sponsors mailing list