ACK: [PULL REQUESTS][focal/jammy/lunar linux-azure] [Azure] Fix VM crash/hang issues due to fast VF add/remove events

John Cabaj john.cabaj at canonical.com
Wed Jul 5 19:04:07 UTC 2023


On 6/12/23 4:13 PM, Tim Gardner wrote:
> BugLink: https://bugs.launchpad.net/bugs/2023594
> 
> SRU Justification
> 
> [Impact]
> 
> A Linux guest on Hyper-V/Azure can occasionally crash during early Linux kernel boot due to a strange host behavior:
> 1. The host assigns a VF to the guest;
> 2. The host immediately unassigns the VF from the guest; //Dexuan: due to some race conditions bug in Linux vPCI driver, Linux can crash.
> 3. The host assigns the VF to the guest again.
> 
> Starting late 2022 (around Nov 2022), Linux guests on Azure started to crash more frequently due to a host side update at that time: a new host/hypervisor feature of handling "correctable memory errors" can cause a lot of successive VF remove/add events, so the race conditions bug in Linux vPCI driver can surface much more easily. The Hyper-V team is implementing a batching mechanism so that the guest will get much less VF remove/add events (ETA: June 2023), but meanwhile we should also get the Linux race condition bugs fixed so that Linux guests won't crash even if it receives the successive VF remove/add events.
> 
> [Test Plan]
> 
> Microsoft tested
> 
> [Regression potential]
> 
> PCI devices may not get registered, or VMs may crash.
> 
> [Other Info]
> 
> SF: #00349076
> 
> -------------------------------------------------------------------------------
> The following changes since commit d250cc0ce73d5582e5eb073fa948567ec2ef67d5:
> 
>   UBUNTU: Ubuntu-azure-5.4.0-1110.116 (2023-06-02 12:51:11 -0600)
> 
> are available in the Git repository at:
> 
>   git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/focal focal-azure-fix-vm-add-remove-race-condition
> 
> for you to fetch changes up to ac03c5aa4ead40832fcd94d814d28ea8087fb906:
> 
>   UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock (2023-06-12 14:57:33 -0600)
> 
> ----------------------------------------------------------------
> Dexuan Cui (4):
>       UBUNTU: SAUCE: PCI: hv: Fix a race condition bug in hv_pci_query_relations()
>       UBUNTU: SAUCE: PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
>       UBUNTU: SAUCE: PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
>       UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock
> 
>  drivers/pci/controller/pci-hyperv.c | 58 ++++++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 38 insertions(+), 20 deletions(-)
> -------------------------------------------------------------------------------
> 
> The following changes since commit 98439f7092e414a0534c5687742e2c8309a30204:
> 
>   UBUNTU: Ubuntu-azure-5.15.0-1040.47 (2023-06-01 13:13:05 -0600)
> 
> are available in the Git repository at:
> 
>   git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/jammy jammy-azure-fix-vm-add-remove-race-condition
> 
> for you to fetch changes up to c7caafcc393e6435c8b6b24ecde574f30be95c35:
> 
>   PCI: hv: Use async probing to reduce boot time (2023-06-12 14:58:58 -0600)
> 
> ----------------------------------------------------------------
> Dexuan Cui (6):
>       PCI: hv: Fix a race condition bug in hv_pci_query_relations()
>       PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
>       PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
>       Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
>       PCI: hv: Add a per-bus mutex state_lock
>       PCI: hv: Use async probing to reduce boot time
> 
>  drivers/pci/controller/pci-hyperv.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
>  1 file changed, 86 insertions(+), 59 deletions(-)
> -------------------------------------------------------------------------------
> 
> The following changes since commit e4972fe7acaf327c1bd496cf2286889bd913c9bf:
> 
>   UBUNTU: Ubuntu-azure-6.2.0-1005.5 (2023-06-01 11:42:19 -0600)
> 
> are available in the Git repository at:
> 
> 
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-azure/+git/lunar lunar-azure-fix-vm-add-remove-race-condition
> 
> for you to fetch changes up to 7e734194962a51084cba9016f0e5a512805f7d0a:
> 
>   PCI: hv: Use async probing to reduce boot time (2023-06-12 15:01:00 -0600)
> 
> ----------------------------------------------------------------
> Dexuan Cui (6):
>       PCI: hv: Fix a race condition bug in hv_pci_query_relations()
>       PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
>       PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
>       Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
>       PCI: hv: Add a per-bus mutex state_lock
>       PCI: hv: Use async probing to reduce boot time
> 
>  drivers/pci/controller/pci-hyperv.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
>  1 file changed, 86 insertions(+), 59 deletions(-)


Acked-by: John Cabaj <john.cabaj at canonical>




More information about the kernel-team mailing list