[PULL REQUESTS][focal/jammy/lunar linux-azure] [Azure] Fix VM crash/hang issues due to fast VF add/remove events
Tim Gardner
tim.gardner at canonical.com
Mon Jun 12 21:13:18 UTC 2023
BugLink: https://bugs.launchpad.net/bugs/2023594
SRU Justification
[Impact]
A Linux guest on Hyper-V/Azure can occasionally crash during early Linux
kernel boot due to a strange host behavior:
1. The host assigns a VF to the guest;
2. The host immediately unassigns the VF from the guest; //Dexuan: due
to some race conditions bug in Linux vPCI driver, Linux can crash.
3. The host assigns the VF to the guest again.
Starting late 2022 (around Nov 2022), Linux guests on Azure started to
crash more frequently due to a host side update at that time: a new
host/hypervisor feature of handling "correctable memory errors" can
cause a lot of successive VF remove/add events, so the race conditions
bug in Linux vPCI driver can surface much more easily. The Hyper-V team
is implementing a batching mechanism so that the guest will get much
less VF remove/add events (ETA: June 2023), but meanwhile we should also
get the Linux race condition bugs fixed so that Linux guests won't crash
even if it receives the successive VF remove/add events.
[Test Plan]
Microsoft tested
[Regression potential]
PCI devices may not get registered, or VMs may crash.
[Other Info]
SF: #00349076
-------------------------------------------------------------------------------
The following changes since commit d250cc0ce73d5582e5eb073fa948567ec2ef67d5:
UBUNTU: Ubuntu-azure-5.4.0-1110.116 (2023-06-02 12:51:11 -0600)
are available in the Git repository at:
git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/focal
focal-azure-fix-vm-add-remove-race-condition
for you to fetch changes up to ac03c5aa4ead40832fcd94d814d28ea8087fb906:
UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock (2023-06-12
14:57:33 -0600)
----------------------------------------------------------------
Dexuan Cui (4):
UBUNTU: SAUCE: PCI: hv: Fix a race condition bug in
hv_pci_query_relations()
UBUNTU: SAUCE: PCI: hv: Fix a race condition in hv_irq_unmask()
that can cause panic
UBUNTU: SAUCE: PCI: hv: Remove the useless hv_pcichild_state from
struct hv_pci_dev
UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock
drivers/pci/controller/pci-hyperv.c | 58
++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 38 insertions(+), 20 deletions(-)
-------------------------------------------------------------------------------
The following changes since commit 98439f7092e414a0534c5687742e2c8309a30204:
UBUNTU: Ubuntu-azure-5.15.0-1040.47 (2023-06-01 13:13:05 -0600)
are available in the Git repository at:
git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/jammy
jammy-azure-fix-vm-add-remove-race-condition
for you to fetch changes up to c7caafcc393e6435c8b6b24ecde574f30be95c35:
PCI: hv: Use async probing to reduce boot time (2023-06-12 14:58:58
-0600)
----------------------------------------------------------------
Dexuan Cui (6):
PCI: hv: Fix a race condition bug in hv_pci_query_relations()
PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
Revert "PCI: hv: Fix a timing issue which causes kdump to fail
occasionally"
PCI: hv: Add a per-bus mutex state_lock
PCI: hv: Use async probing to reduce boot time
drivers/pci/controller/pci-hyperv.c | 145
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
1 file changed, 86 insertions(+), 59 deletions(-)
-------------------------------------------------------------------------------
The following changes since commit e4972fe7acaf327c1bd496cf2286889bd913c9bf:
UBUNTU: Ubuntu-azure-6.2.0-1005.5 (2023-06-01 11:42:19 -0600)
are available in the Git repository at:
git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-azure/+git/lunar
lunar-azure-fix-vm-add-remove-race-condition
for you to fetch changes up to 7e734194962a51084cba9016f0e5a512805f7d0a:
PCI: hv: Use async probing to reduce boot time (2023-06-12 15:01:00
-0600)
----------------------------------------------------------------
Dexuan Cui (6):
PCI: hv: Fix a race condition bug in hv_pci_query_relations()
PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
Revert "PCI: hv: Fix a timing issue which causes kdump to fail
occasionally"
PCI: hv: Add a per-bus mutex state_lock
PCI: hv: Use async probing to reduce boot time
drivers/pci/controller/pci-hyperv.c | 145
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
1 file changed, 86 insertions(+), 59 deletions(-)
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list