ACK/Cmnt: [SRU][j,l/linux-azure][PATCH 0/1] Fix kernel panic when removing GPU

Manuel Diewald manuel.diewald at canonical.com
Wed Dec 6 18:12:31 UTC 2023


On Wed, Dec 06, 2023 at 07:36:34PM +0200, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/2042568
> 
> SRU Justification
> 
> [Description]
> 
> On a VM on Azure with a Tesla gpu it was noticed that when removing
> the gpu from the pci the vm would crash. In case the nvidia drivers
> are loaded, the machine won't crash. Instead the removing process 
> will hang and the machine will crash on reboot.
> 
> This is related to bug [1].
> The bug reported in [1] regards another driver but the root cause is
> the same. It is still investigated whether this is a bug in pci, or
> it is a bug of various drivers on how they use pci.
> 
> For this case we have identified that removing commit [2] prevents 
> the kernel crashes.
> 
> Azure has requested to revert this commit, at least for the time 
> being. This commit is not in upstream, so it just need to be 
> reverted from Ubuntu kernels.
> 
> [Test Case]
> 
> On an Azure vm with a gpu :
> 
> # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
> 
> where '0001:00:00.0' the pci address of the gpu.
> The vm will crash.
> 
> [Where things could go wrong]
> 
> The commit to be reverted was included in a patchset to address bugs
> https://bugs.launchpad.net/bugs/2023071 and 
> https://bugs.launchpad.net/bugs/2023594
> 
> However this commit just reduces boot time and removing shall not 
> introduce any regressions. Side effects will be increase in the boot
> time.
> 
> [Other]
> 
> Only Ubuntu azure kernels are affected :
> 
> - Jammy 5.15
> - Lunar 6.2
> 
> Focal is also affected since it's using 5.15 kernel.
> This commit does not appear in Mantic 6.5 kernel.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
> [2] https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?id=75af0c10b370
> 
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

I think it is usually a good idea to include at least a one-liner
describing why the commit is reverted in the commit message.

Acked-by: Manuel Diewald <manuel.diewald at canonical.com>

-- 
 Manuel



More information about the kernel-team mailing list