ACK: [SRU][j, l/linux-azure][PATCH 0/1] Fix kernel panic when removing GPU

Tim Gardner tim.gardner at canonical.com
Thu Dec 7 19:23:57 UTC 2023


On 12/6/23 10:36, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/2042568
> 
> SRU Justification
> 
> [Description]
> 
> On a VM on Azure with a Tesla gpu it was noticed that when removing
> the gpu from the pci the vm would crash. In case the nvidia drivers
> are loaded, the machine won't crash. Instead the removing process
> will hang and the machine will crash on reboot.
> 
> This is related to bug [1].
> The bug reported in [1] regards another driver but the root cause is
> the same. It is still investigated whether this is a bug in pci, or
> it is a bug of various drivers on how they use pci.
> 
> For this case we have identified that removing commit [2] prevents
> the kernel crashes.
> 
> Azure has requested to revert this commit, at least for the time
> being. This commit is not in upstream, so it just need to be
> reverted from Ubuntu kernels.
> 
> [Test Case]
> 
> On an Azure vm with a gpu :
> 
> # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
> 
> where '0001:00:00.0' the pci address of the gpu.
> The vm will crash.
> 
> [Where things could go wrong]
> 
> The commit to be reverted was included in a patchset to address bugs
> https://bugs.launchpad.net/bugs/2023071 and
> https://bugs.launchpad.net/bugs/2023594
> 
> However this commit just reduces boot time and removing shall not
> introduce any regressions. Side effects will be increase in the boot
> time.
> 
> [Other]
> 
> Only Ubuntu azure kernels are affected :
> 
> - Jammy 5.15
> - Lunar 6.2
> 
> Focal is also affected since it's using 5.15 kernel.
> This commit does not appear in Mantic 6.5 kernel.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
> [2] https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?id=75af0c10b370
> 
> 
> 
Acked-by: Tim Gardner <tim.gardner at canonical.com>
-- 
-----------
Tim Gardner
Canonical, Inc




More information about the kernel-team mailing list