ACK: [focal:linux-azure, bionic:linux-azure-4.15][PATCH 0/5] Fix kdump Over Network

Colin Ian King colin.king at canonical.com
Thu Oct 8 11:28:23 UTC 2020


On 07/10/2020 22:16, Kelsey Skunberg wrote:
> BugLink: https://bugs.launchpad.net/bugs/1883261
> 
> [Impact]
> 
> Microsoft would like to request two kdump related fixes in all releases
> supported on Azure. The two commits are:
> 
> c81992e7f4aa1 ("PCI: hv: Retry PCI bus D0 entry on invalid device
> state")
> 83cc3508ffaa6 ("PCI: hv: Fix the PCI HyperV probe failure path
> to release resource properly")
> 
> These are in the virtual PCI driver for Hyper-V. The customer visible
> symptom is that the network is not functional in the kdump kernel, so
> the dump file must be stored on the local disk and cannot be written
> over the network.
> 
> The problem only occurs when Accelerated Networking is enabled. It’s a
> relatively obscure scenario, which is why the problem has not surfaced
> before now. But we have an important customer who wants the
> “dump-file-over-the-network” functionality to work.
> 
> For bionic/linux-azure-4.15, the following additional patch needs to be
> backported first to allow the requested patches to apply cleanly:
> 
> a8e37506e79a ("PCI: hv: Reorganize the code in preparation of
> hibernation")
> 
> [Test Case]
> 
> - Apply requested patches and boot into updated kernel
> - Verify Accelerated Networking is enabled
> - Set up kdump
> - configure kdump to use SSH
> - Test the crash dump mechanism and verify the kernel crash dump appears
>   on the selected remote server
> 
> Further details for setting up kdump through testing can be found here:
> https://ubuntu.com/server/docs/kernel-crash-dump
> 
> [Regression Potential]
> 
> Patches are only targeted to azure kernels.
> 
> Patches are desgiend to release allocated resources remaining after
> error cases in hv_pci_probe() or PCI devices not being shut down
> properly. if those resources are still not correctly released, then
> entering D0 state in kdump kernel could continue to fail.
> 
> Potential for finding regression with freeing resources or still failing to
> enter D0 state in the kdump kernel even after all resources have been
> released.  
> 
> Build & boot tested. Verified kdump works as intended over SSH after
> patches are applied.
> 
> Both 5.4 and 4.15 test kernels were sent to Microsoft. Both kernels
> signed off on and verified to resolve problem.
> 
> 
> Changes for Bionic/linux-azure-4.15:
> 
> 
> Dexuan Cui (1):
>   PCI: hv: Reorganize the code in preparation of hibernation
> 
> Wei Hu (2):
>   PCI: hv: Fix the PCI HyperV probe failure path to release resource
>     properly
>   PCI: hv: Retry PCI bus D0 entry on invalid device state
> 
>  drivers/pci/host/pci-hyperv.c | 101 +++++++++++++++++++++++++++-------
>  1 file changed, 81 insertions(+), 20 deletions(-)
> 
> 
> Changes for Focal/linux-azure:
> 
> Wei Hu (2):
>   PCI: hv: Fix the PCI HyperV probe failure path to release resource
>     properly
>   PCI: hv: Retry PCI bus D0 entry on invalid device state
> 
>  drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++++---
>  1 file changed, 54 insertions(+), 6 deletions(-)
> 
> --
> 2.25.1
> 

Thanks Kelsey; backports look good to me, good test case and results, I
think the regression potential vs benefit looks sane, so..

Acked-by: Colin Ian King <colin.king at canonical.com>



More information about the kernel-team mailing list