[Bug 1681909] Re: kdump is not captured in remote host when kdump over ssh is configured

Guilherme G. Piccoli 1681909 at bugs.launchpad.net
Fri Aug 16 19:21:17 UTC 2019


After updating the package for eaon-proposed with the fix for ppc64 (thanks Eric!), I've manually tested that version and it's working fine, being able to collect the kernel crash dump.
The version I've tested is:

$ rmadison makedumpfile | grep eoan-proposed
 makedumpfile | 1:1.6.6-2ubuntu1         | eoan-proposed    | source, amd64, arm64, armhf, i386, ppc64el, s390x

Even with my positive test results, the autopkgtest is broken and keep
failing for ppc64, according to:
http://autopkgtest.ubuntu.com/packages/m/makedumpfile/eoan/ppc64el

According to the above test, we can see 2 important things:
a) The last time it succeeded was : "makedumpfile/1:1.6.5-1ubuntu2 2019-06-20".
And we can see in fact the ppc64 test was Skipped, that's the reason it succeeded.

b) Even the current released version for eoan, 1:1.6.5-1ubuntu2, is
failing according to the autopkgtest that ran on:
"makedumpfile/1:1.6.5-1ubuntu2 2019-07-25". This test was triggered due
to makedumpfile being a reverse dependency of "file".

Also, I've try to replicate the test in one local powerpc64 server,
using autopkgtest. My command-line was:

https://pastebin.ubuntu.com/p/y4KyRvJVmz/

I've switched 2 parameters, having 4 tests results:

1) With "--apt-pocket=proposed=src:makedumpfile" (to test -proposed version) and "--nova-reboot":
http://paste.ubuntu.com/p/Z8dr6ssF2J/

2) With "--apt-pocket=proposed=src:makedumpfile" (to test -proposed version) only
http://paste.ubuntu.com/p/qxDRy5xPSZ/

3) Without both parameters above (testing the released version)
http://paste.ubuntu.com/p/cYdMrWHPsR/

4) Only with "--nova-reboot":
http://paste.ubuntu.com/p/xYC4KG9DZr/

In all cases I've failed, with "Broken Pipe" in a late part of the test.
During the failure, I could even SSH into the testbed, so something
clearly is wrong with the test.

I'd like to ask hereby an exemption from this test, marking it as "badtest" in ppc64.
Cascardo, do you agree? I think we could release this package before the Eoan freeze, which is sane (based on manual tests made by a colleague and I), we shouldn't block based on a test that was always skipped.

We still plan to continue investigating the test failure, until we can fix it.
Thanks,


Guilherme

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1681909

Title:
  kdump is not captured in remote host when kdump over ssh is configured

Status in The Ubuntu-power-systems project:
  In Progress
Status in makedumpfile package in Ubuntu:
  Fix Committed
Status in makedumpfile source package in Xenial:
  Won't Fix
Status in makedumpfile source package in Bionic:
  In Progress
Status in makedumpfile source package in Cosmic:
  Won't Fix
Status in makedumpfile source package in Disco:
  In Progress
Status in makedumpfile source package in Eoan:
  Fix Committed

Bug description:
  [Impact]

  * Kdump over network (like NFS mount or SSH dump) relies on network-
  online target from systemd. Even so, there are some NICs that report
  "Link Up" state but aren't ready to transmit packets. This is a
  generally bad behavior that is credited probably to NIC firmware
  delays, usually not fixable from drivers. Some adapters known to act
  like this are bnx2x, tg3 and ixgbe.

  * Kdump is a mechanism that may be a last resort to debug complex/hard
  to reproduce issues, so it's interesting to increase its reliability /
  resilience. We then propose here a solution/quirk to this issue on
  network dump by adding a retry/delay mechanism; if it's a network
  dump, kdump will retry some times and sleep between the attempts in
  order to exclude the case of NICs that aren't ready yet but will soon
  be able to transmit packets.

  * Although first reported by IBM in PowerPC arch, the scope for this
  issue is the NIC, and it was later reported in x86 arch too.

  [Test case]

  Usually it's difficult to naturally reproduce this issue in a deterministic way, but we have an artificial test case on comment #24 of this LP.
  Also, we have a report from this bug in which the user managed to reproduce the problem consistently - it's fixed after testing our solution.

  [Regression potential]

  There's not a clear regression potential here since it's just a retry/delay mechanism. Some potential problems may come from bad coding in the script.
  The delay between attempts is only 3 sec per iteration, so it shouldn't block the kdump progress for a high amount of time at once.

  [Other information]

  Salsa Debian commit:
  https://salsa.debian.org/debian/makedumpfile/commit/d63ba95337988be1eac8c8c76d90825ff5c6d17f

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1681909/+subscriptions



More information about the Ubuntu-sponsors mailing list