[PULL REQUESTS][focal/jammy/lunar linux-azure] [Azure] Fix VM crash/hang issues due to fast VF add/remove events

Tim Gardner tim.gardner at canonical.com
Mon Jun 12 21:13:18 UTC 2023


BugLink: https://bugs.launchpad.net/bugs/2023594

SRU Justification

[Impact]

A Linux guest on Hyper-V/Azure can occasionally crash during early Linux 
kernel boot due to a strange host behavior:
1. The host assigns a VF to the guest;
2. The host immediately unassigns the VF from the guest; //Dexuan: due 
to some race conditions bug in Linux vPCI driver, Linux can crash.
3. The host assigns the VF to the guest again.

Starting late 2022 (around Nov 2022), Linux guests on Azure started to 
crash more frequently due to a host side update at that time: a new 
host/hypervisor feature of handling "correctable memory errors" can 
cause a lot of successive VF remove/add events, so the race conditions 
bug in Linux vPCI driver can surface much more easily. The Hyper-V team 
is implementing a batching mechanism so that the guest will get much 
less VF remove/add events (ETA: June 2023), but meanwhile we should also 
get the Linux race condition bugs fixed so that Linux guests won't crash 
even if it receives the successive VF remove/add events.

[Test Plan]

Microsoft tested

[Regression potential]

PCI devices may not get registered, or VMs may crash.

[Other Info]

SF: #00349076

-------------------------------------------------------------------------------
The following changes since commit d250cc0ce73d5582e5eb073fa948567ec2ef67d5:

   UBUNTU: Ubuntu-azure-5.4.0-1110.116 (2023-06-02 12:51:11 -0600)

are available in the Git repository at:

   git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/focal 
focal-azure-fix-vm-add-remove-race-condition

for you to fetch changes up to ac03c5aa4ead40832fcd94d814d28ea8087fb906:

   UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock (2023-06-12 
14:57:33 -0600)

----------------------------------------------------------------
Dexuan Cui (4):
       UBUNTU: SAUCE: PCI: hv: Fix a race condition bug in 
hv_pci_query_relations()
       UBUNTU: SAUCE: PCI: hv: Fix a race condition in hv_irq_unmask() 
that can cause panic
       UBUNTU: SAUCE: PCI: hv: Remove the useless hv_pcichild_state from 
struct hv_pci_dev
       UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock

  drivers/pci/controller/pci-hyperv.c | 58 
++++++++++++++++++++++++++++++++++++++--------------------
  1 file changed, 38 insertions(+), 20 deletions(-)
-------------------------------------------------------------------------------

The following changes since commit 98439f7092e414a0534c5687742e2c8309a30204:

   UBUNTU: Ubuntu-azure-5.15.0-1040.47 (2023-06-01 13:13:05 -0600)

are available in the Git repository at:

   git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/jammy 
jammy-azure-fix-vm-add-remove-race-condition

for you to fetch changes up to c7caafcc393e6435c8b6b24ecde574f30be95c35:

   PCI: hv: Use async probing to reduce boot time (2023-06-12 14:58:58 
-0600)

----------------------------------------------------------------
Dexuan Cui (6):
       PCI: hv: Fix a race condition bug in hv_pci_query_relations()
       PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
       PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
       Revert "PCI: hv: Fix a timing issue which causes kdump to fail 
occasionally"
       PCI: hv: Add a per-bus mutex state_lock
       PCI: hv: Use async probing to reduce boot time

  drivers/pci/controller/pci-hyperv.c | 145 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
  1 file changed, 86 insertions(+), 59 deletions(-)
-------------------------------------------------------------------------------

The following changes since commit e4972fe7acaf327c1bd496cf2286889bd913c9bf:

   UBUNTU: Ubuntu-azure-6.2.0-1005.5 (2023-06-01 11:42:19 -0600)

are available in the Git repository at:

 
git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-azure/+git/lunar 
lunar-azure-fix-vm-add-remove-race-condition

for you to fetch changes up to 7e734194962a51084cba9016f0e5a512805f7d0a:

   PCI: hv: Use async probing to reduce boot time (2023-06-12 15:01:00 
-0600)

----------------------------------------------------------------
Dexuan Cui (6):
       PCI: hv: Fix a race condition bug in hv_pci_query_relations()
       PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
       PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
       Revert "PCI: hv: Fix a timing issue which causes kdump to fail 
occasionally"
       PCI: hv: Add a per-bus mutex state_lock
       PCI: hv: Use async probing to reduce boot time

  drivers/pci/controller/pci-hyperv.c | 145 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
  1 file changed, 86 insertions(+), 59 deletions(-)
-- 
-----------
Tim Gardner
Canonical, Inc



More information about the kernel-team mailing list