ACK: [PULL REQUESTS][focal/jammy/lunar linux-azure] [Azure] Fix VM crash/hang issues due to fast VF add/remove events

Philip Cox philip.cox at canonical.com
Tue Jun 13 16:10:42 UTC 2023


On Mon, 2023-06-12 at 14:13 -0700, Tim Gardner wrote:
> BugLink: https://bugs.launchpad.net/bugs/2023594
> 
> SRU Justification
> 
> [Impact]
> 
> A Linux guest on Hyper-V/Azure can occasionally crash during early
> Linux 
> kernel boot due to a strange host behavior:
> 1. The host assigns a VF to the guest;
> 2. The host immediately unassigns the VF from the guest; //Dexuan:
> due 
> to some race conditions bug in Linux vPCI driver, Linux can crash.
> 3. The host assigns the VF to the guest again.
> 
> Starting late 2022 (around Nov 2022), Linux guests on Azure started
> to 
> crash more frequently due to a host side update at that time: a new 
> host/hypervisor feature of handling "correctable memory errors" can 
> cause a lot of successive VF remove/add events, so the race
> conditions 
> bug in Linux vPCI driver can surface much more easily. The Hyper-V
> team 
> is implementing a batching mechanism so that the guest will get much 
> less VF remove/add events (ETA: June 2023), but meanwhile we should
> also 
> get the Linux race condition bugs fixed so that Linux guests won't
> crash 
> even if it receives the successive VF remove/add events.
> 
> [Test Plan]
> 
> Microsoft tested
> 
> [Regression potential]
> 
> PCI devices may not get registered, or VMs may crash.
> 
> [Other Info]
> 
> SF: #00349076
> 
> ---------------------------------------------------------------------
> ----------
> The following changes since commit
> d250cc0ce73d5582e5eb073fa948567ec2ef67d5:
> 
>    UBUNTU: Ubuntu-azure-5.4.0-1110.116 (2023-06-02 12:51:11 -0600)
> 
> are available in the Git repository at:
> 
>    git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/focal 
> focal-azure-fix-vm-add-remove-race-condition
> 
> for you to fetch changes up to
> ac03c5aa4ead40832fcd94d814d28ea8087fb906:
> 
>    UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock (2023-06-12
> 14:57:33 -0600)
> 
> ----------------------------------------------------------------
> Dexuan Cui (4):
>        UBUNTU: SAUCE: PCI: hv: Fix a race condition bug in 
> hv_pci_query_relations()
>        UBUNTU: SAUCE: PCI: hv: Fix a race condition in
> hv_irq_unmask() 
> that can cause panic
>        UBUNTU: SAUCE: PCI: hv: Remove the useless hv_pcichild_state
> from 
> struct hv_pci_dev
>        UBUNTU: SAUCE: PCI: hv: Add a per-bus mutex state_lock
> 
>   drivers/pci/controller/pci-hyperv.c | 58 
> ++++++++++++++++++++++++++++++++++++++--------------------
>   1 file changed, 38 insertions(+), 20 deletions(-)
> ---------------------------------------------------------------------
> ----------
> 
> The following changes since commit
> 98439f7092e414a0534c5687742e2c8309a30204:
> 
>    UBUNTU: Ubuntu-azure-5.15.0-1040.47 (2023-06-01 13:13:05 -0600)
> 
> are available in the Git repository at:
> 
>    git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux/+git/jammy 
> jammy-azure-fix-vm-add-remove-race-condition
> 
> for you to fetch changes up to
> c7caafcc393e6435c8b6b24ecde574f30be95c35:
> 
>    PCI: hv: Use async probing to reduce boot time (2023-06-12
> 14:58:58 
> -0600)
> 
> ----------------------------------------------------------------
> Dexuan Cui (6):
>        PCI: hv: Fix a race condition bug in hv_pci_query_relations()
>        PCI: hv: Fix a race condition in hv_irq_unmask() that can
> cause panic
>        PCI: hv: Remove the useless hv_pcichild_state from struct
> hv_pci_dev
>        Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> occasionally"
>        PCI: hv: Add a per-bus mutex state_lock
>        PCI: hv: Use async probing to reduce boot time
> 
>   drivers/pci/controller/pci-hyperv.c | 145 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
> ---------------------------
>   1 file changed, 86 insertions(+), 59 deletions(-)
> ---------------------------------------------------------------------
> ----------
> 
> The following changes since commit
> e4972fe7acaf327c1bd496cf2286889bd913c9bf:
> 
>    UBUNTU: Ubuntu-azure-6.2.0-1005.5 (2023-06-01 11:42:19 -0600)
> 
> are available in the Git repository at:
> 
>  
> git://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-
> azure/+git/lunar 
> lunar-azure-fix-vm-add-remove-race-condition
> 
> for you to fetch changes up to
> 7e734194962a51084cba9016f0e5a512805f7d0a:
> 
>    PCI: hv: Use async probing to reduce boot time (2023-06-12
> 15:01:00 
> -0600)
> 
> ----------------------------------------------------------------
> Dexuan Cui (6):
>        PCI: hv: Fix a race condition bug in hv_pci_query_relations()
>        PCI: hv: Fix a race condition in hv_irq_unmask() that can
> cause panic
>        PCI: hv: Remove the useless hv_pcichild_state from struct
> hv_pci_dev
>        Revert "PCI: hv: Fix a timing issue which causes kdump to fail
> occasionally"
>        PCI: hv: Add a per-bus mutex state_lock
>        PCI: hv: Use async probing to reduce boot time
> 
>   drivers/pci/controller/pci-hyperv.c | 145 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
> ---------------------------
>   1 file changed, 86 insertions(+), 59 deletions(-)
> -- 
> -----------
> Tim Gardner
> Canonical, Inc
> 

-- 
Acked-by: Philip Cox <philip.cox at canonical.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20230613/bbb7f271/attachment-0001.html>


More information about the kernel-team mailing list