APPLIED: [SRU][j,l/linux-azure][PATCH 0/1] Fix kernel panic when removing GPU
Tim Gardner
tim.gardner at canonical.com
Thu Dec 7 19:35:35 UTC 2023
On 12/6/23 10:36 AM, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/2042568
>
> SRU Justification
>
> [Description]
>
> On a VM on Azure with a Tesla gpu it was noticed that when removing
> the gpu from the pci the vm would crash. In case the nvidia drivers
> are loaded, the machine won't crash. Instead the removing process
> will hang and the machine will crash on reboot.
>
> This is related to bug [1].
> The bug reported in [1] regards another driver but the root cause is
> the same. It is still investigated whether this is a bug in pci, or
> it is a bug of various drivers on how they use pci.
>
> For this case we have identified that removing commit [2] prevents
> the kernel crashes.
>
> Azure has requested to revert this commit, at least for the time
> being. This commit is not in upstream, so it just need to be
> reverted from Ubuntu kernels.
>
> [Test Case]
>
> On an Azure vm with a gpu :
>
> # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
>
> where '0001:00:00.0' the pci address of the gpu.
> The vm will crash.
>
> [Where things could go wrong]
>
> The commit to be reverted was included in a patchset to address bugs
> https://bugs.launchpad.net/bugs/2023071 and
> https://bugs.launchpad.net/bugs/2023594
>
> However this commit just reduces boot time and removing shall not
> introduce any regressions. Side effects will be increase in the boot
> time.
>
> [Other]
>
> Only Ubuntu azure kernels are affected :
>
> - Jammy 5.15
> - Lunar 6.2
>
> Focal is also affected since it's using 5.15 kernel.
> This commit does not appear in Mantic 6.5 kernel.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
> [2] https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?id=75af0c10b370
>
>
>
Applied to j/l linux-azure:master-next. Thanks.
-rtg
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list