[focal:linux-azure, bionic:linux-azure-4.15][PATCH 0/5] Fix kdump Over Network

Kelsey Skunberg kelsey.skunberg at canonical.com
Wed Oct 7 21:16:35 UTC 2020


BugLink: https://bugs.launchpad.net/bugs/1883261

[Impact]

Microsoft would like to request two kdump related fixes in all releases
supported on Azure. The two commits are:

c81992e7f4aa1 ("PCI: hv: Retry PCI bus D0 entry on invalid device
state")
83cc3508ffaa6 ("PCI: hv: Fix the PCI HyperV probe failure path
to release resource properly")

These are in the virtual PCI driver for Hyper-V. The customer visible
symptom is that the network is not functional in the kdump kernel, so
the dump file must be stored on the local disk and cannot be written
over the network.

The problem only occurs when Accelerated Networking is enabled. It’s a
relatively obscure scenario, which is why the problem has not surfaced
before now. But we have an important customer who wants the
“dump-file-over-the-network” functionality to work.

For bionic/linux-azure-4.15, the following additional patch needs to be
backported first to allow the requested patches to apply cleanly:

a8e37506e79a ("PCI: hv: Reorganize the code in preparation of
hibernation")

[Test Case]

- Apply requested patches and boot into updated kernel
- Verify Accelerated Networking is enabled
- Set up kdump
- configure kdump to use SSH
- Test the crash dump mechanism and verify the kernel crash dump appears
  on the selected remote server

Further details for setting up kdump through testing can be found here:
https://ubuntu.com/server/docs/kernel-crash-dump

[Regression Potential]

Patches are only targeted to azure kernels.

Patches are desgiend to release allocated resources remaining after
error cases in hv_pci_probe() or PCI devices not being shut down
properly. if those resources are still not correctly released, then
entering D0 state in kdump kernel could continue to fail.

Potential for finding regression with freeing resources or still failing to
enter D0 state in the kdump kernel even after all resources have been
released.  

Build & boot tested. Verified kdump works as intended over SSH after
patches are applied.

Both 5.4 and 4.15 test kernels were sent to Microsoft. Both kernels
signed off on and verified to resolve problem.


Changes for Bionic/linux-azure-4.15:


Dexuan Cui (1):
  PCI: hv: Reorganize the code in preparation of hibernation

Wei Hu (2):
  PCI: hv: Fix the PCI HyperV probe failure path to release resource
    properly
  PCI: hv: Retry PCI bus D0 entry on invalid device state

 drivers/pci/host/pci-hyperv.c | 101 +++++++++++++++++++++++++++-------
 1 file changed, 81 insertions(+), 20 deletions(-)


Changes for Focal/linux-azure:

Wei Hu (2):
  PCI: hv: Fix the PCI HyperV probe failure path to release resource
    properly
  PCI: hv: Retry PCI bus D0 entry on invalid device state

 drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++++---
 1 file changed, 54 insertions(+), 6 deletions(-)

--
2.25.1



More information about the kernel-team mailing list