ACK: [PULL REQUESTS][focal/jammy/lunar linux-azure] [Azure] Fix VM crash/hang issues due to fast VF add/remove events
Philip Cox
philip.cox at canonical.com
Tue Jun 13 16:10:42 UTC 2023
On Mon, 2023-06-12 at 14:13 -0700, Tim Gardner wrote:
> BugLink: https://bugs.launchpad.net/bugs/2023594
>
> SRU Justification
>
> [Impact]
>
> A Linux guest on Hyper-V/Azure can occasionally crash during early
> Linux
> kernel boot due to a strange host behavior:
> 1. The host assigns a VF to the guest;
> 2. The host immediately unassigns the VF from the guest; //Dexuan:
> due
> to some race conditions bug in Linux vPCI driver, Linux can crash.
> 3. The host assigns the VF to the guest again.
>
> Starting late 2022 (around Nov 2022), Linux guests on Azure started
> to
> crash more frequently due to a host side update at that time: a new
> host/hypervisor feature of handling "correctable memory errors" can
> cause a lot of successive VF remove/add events, so the race
> conditions
> bug in Linux vPCI driver can surface much more easily. The Hyper-V
> team
> is implementing a batching mechanism so that the guest will get much
> less VF remove/add events (ETA: June 2023), but meanwhile we should
> also
> get the Linux race condition bugs fixed so that Linux guests won't
> crash
> even if it receives the successive VF remove/add events.
>
> [Test Plan]
>
> Microsoft tested
>
> [Regression potential]
>
> PCI devices may not get registered, or VMs may crash.
>
> [Other Info]
>
> SF: #00349076
>
> ---------------------------------------------------------------------
> ----------
> The following changes since commit
> d250cc0ce73d5582e5eb073fa948567ec2ef67d5:
>
> UBUNTU: Ubuntu-azure-5.4.0-1110.116 (2023-06-02 12:51:11 -0600)
>
> are available in the Git repository at:
>
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/focal
> focal-azure-fix-vm-add-remove-race-condition
>
> for you to fetch changes up to
> ac03c5aa4ead40832fcd94d814d28ea8087fb906:
>
> UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock (2023-06-12
> 14:57:33 -0600)
>
> ----------------------------------------------------------------
> Dexuan Cui (4):
> UBUNTU: SAUCE: PCI: hv: Fix a race condition bug in
> hv_pci_query_relations()
> UBUNTU: SAUCE: PCI: hv: Fix a race condition in
> hv_irq_unmask()
> that can cause panic
> UBUNTU: SAUCE: PCI: hv: Remove the useless hv_pcichild_state
> from
> struct hv_pci_dev
> UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock
>
> drivers/pci/controller/pci-hyperv.c | 58
> ++++++++++++++++++++++++++++++++++++++--------------------
> 1 file changed, 38 insertions(+), 20 deletions(-)
> ---------------------------------------------------------------------
> ----------
>
> The following changes since commit
> 98439f7092e414a0534c5687742e2c8309a30204:
>
> UBUNTU: Ubuntu-azure-5.15.0-1040.47 (2023-06-01 13:13:05 -0600)
>
> are available in the Git repository at:
>
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/jammy
> jammy-azure-fix-vm-add-remove-race-condition
>
> for you to fetch changes up to
> c7caafcc393e6435c8b6b24ecde574f30be95c35:
>
> PCI: hv: Use async probing to reduce boot time (2023-06-12
> 14:58:58
> -0600)
>
> ----------------------------------------------------------------
> Dexuan Cui (6):
> PCI: hv: Fix a race condition bug in hv_pci_query_relations()
> PCI: hv: Fix a race condition in hv_irq_unmask() that can
> cause panic
> PCI: hv: Remove the useless hv_pcichild_state from struct
> hv_pci_dev
> Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> occasionally"
> PCI: hv: Add a per-bus mutex state_lock
> PCI: hv: Use async probing to reduce boot time
>
> drivers/pci/controller/pci-hyperv.c | 145
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
> ---------------------------
> 1 file changed, 86 insertions(+), 59 deletions(-)
> ---------------------------------------------------------------------
> ----------
>
> The following changes since commit
> e4972fe7acaf327c1bd496cf2286889bd913c9bf:
>
> UBUNTU: Ubuntu-azure-6.2.0-1005.5 (2023-06-01 11:42:19 -0600)
>
> are available in the Git repository at:
>
>
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-
> azure/+git/lunar
> lunar-azure-fix-vm-add-remove-race-condition
>
> for you to fetch changes up to
> 7e734194962a51084cba9016f0e5a512805f7d0a:
>
> PCI: hv: Use async probing to reduce boot time (2023-06-12
> 15:01:00
> -0600)
>
> ----------------------------------------------------------------
> Dexuan Cui (6):
> PCI: hv: Fix a race condition bug in hv_pci_query_relations()
> PCI: hv: Fix a race condition in hv_irq_unmask() that can
> cause panic
> PCI: hv: Remove the useless hv_pcichild_state from struct
> hv_pci_dev
> Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> occasionally"
> PCI: hv: Add a per-bus mutex state_lock
> PCI: hv: Use async probing to reduce boot time
>
> drivers/pci/controller/pci-hyperv.c | 145
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
> ---------------------------
> 1 file changed, 86 insertions(+), 59 deletions(-)
> --
> -----------
> Tim Gardner
> Canonical, Inc
>
--
Acked-by: Philip Cox <philip.cox at canonical.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20230613/bbb7f271/attachment-0001.html>
More information about the kernel-team
mailing list