APPLIED: [SRU][Noble][PATCH 0/1] Removing legacy virtio-pci devices causes kernel panic
Stefan Bader
stefan.bader at canonical.com
Fri Jun 21 13:49:05 UTC 2024
On 18.06.24 07:28, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2067862
>
> [Impact]
>
> If you detach a legacy virtio-pci device from a current Noble system, it will
> cause a null pointer dereference, and panic the system. This is an issue if you
> force noble to use legacy virtio-pci devices, or run noble on very old
> hypervisors that only support legacy virtio-pci devices, e.g. trusty and older.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:0x0
> ...
> Call Trace:
> <TASK>
> ? show_regs+0x6d/0x80
> ? __die+0x24/0x80
> ? page_fault_oops+0x99/0x1b0
> ? do_user_addr_fault+0x2ee/0x6b0
> ? exc_page_fault+0x83/0x1b0
> ? asm_exc_page_fault+0x27/0x30
> vp_del_vqs+0x6e/0x2a0
> remove_vq_common+0x166/0x1a0
> virtnet_remove+0x61/0x80
> virtio_dev_remove+0x3f/0xc0
> device_remove+0x40/0x80
> device_release_driver_internal+0x20b/0x270
> device_release_driver+0x12/0x20
> bus_remove_device+0xcb/0x140
> device_del+0x161/0x3e0
> ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0
> device_unregister+0x17/0x60
> unregister_virtio_device+0x16/0x40
> virtio_pci_remove+0x43/0xa0
> pci_device_remove+0x36/0xb0
> device_remove+0x40/0x80
> device_release_driver_internal+0x20b/0x270
> device_release_driver+0x12/0x20
> pci_stop_bus_device+0x7a/0xb0
> pci_stop_and_remove_bus_device+0x12/0x30
> disable_slot+0x4f/0xa0
> acpiphp_disable_and_eject_slot+0x1c/0xa0
> hotplug_event+0x11b/0x280
> ? __pfx_acpiphp_hotplug_notify+0x10/0x10
> acpiphp_hotplug_notify+0x27/0x70
> acpi_device_hotplug+0xb6/0x300
> acpi_hotplug_work_fn+0x1e/0x40
> process_one_work+0x16c/0x350
> worker_thread+0x306/0x440
> ? _raw_spin_lock_irqsave+0xe/0x20
> ? __pfx_worker_thread+0x10/0x10
> kthread+0xef/0x120
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x44/0x70
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1b/0x30
> </TASK>
>
> The issue was introduced in:
>
> commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a
> Author: Feng Liu <feliu at nvidia.com>
> Date: Tue Dec 19 11:32:40 2023 +0200
> Subject: virtio-pci: Introduce admin virtqueue
> Link: https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a
>
> Modern virtio-pci devices are not affected. If the device is a legacy virtio
> device, the is_avq function pointer is not assigned in the virtio_pci_device
> structure of the legacy virtio device, resulting in a NULL pointer dereference
> when the code calls if (vp_dev->is_avq(vdev, vq->index)).
>
> There is no workaround. If you are affected, then not detaching devices for the
> time being is the only solution.
>
> [Fix]
>
> This was fixed in 6.9-rc1 by:
>
> commit c8fae27d141a32a1624d0d0d5419d94252824498
> From: Li Zhang <zhanglikernel at gmail.com>
> Date: Sat, 16 Mar 2024 13:25:54 +0800
> Subject: virtio-pci: Check if is_avq is NULL
> Link: https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498
>
> This is a clean cherry pick to noble. The commit just adds a basic NULL pointer
> check before it dereferences the pointer.
>
> [Testcase]
>
> Start a fresh Noble VM.
>
> Edit the grub kernel command line:
>
> 1) sudo vim /etc/default/grub
> GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1"
> 2) sudo update-grub
> 3) sudo reboot
>
> Outside the VM, on the host:
>
> $ qemu-img create -f qcow2 /root/share-device.qcow2 2G
> $ cat >> share-device.xml << EOF
> disk type='file' device='disk'>
> <driver name='qemu' type='qcow2' cache='writeback' io='threads'/>
> <source file='/root/share-device.qcow2'/>
> <target dev='vdc' bus='virtio'/>
> </disk>
> EOF
> $ sudo -s
> # virsh attach-device noble-test share-device.xml --config --live
> # virsh detach-device noble-test share-device.xml --config --live
>
> A kernel panic should occur.
>
> There is a test kernel available in:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test
>
> If you install it, the panic should no longer occur.
>
> [Where problems could occur]
>
> We are adding a basic null pointer check right before the pointer is about to be
> used, which is quite low risk.
>
> If a regression were to occur, it would only affect VMs using legacy virtio-pci
> devices, which is not the default. It would potentially have large impacts on
> fleets of very old hypervisors running trusty, precise or lucid, but that is
> very unlikely in this day and age.
>
> [Other Info]
>
> Upstream mailing list discussion and author testcase:
> https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=peVEES6Xdw1zSsJkb-bc9Etx9Q@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7
>
> Li Zhang (1):
> virtio-pci: Check if is_avq is NULL
>
> drivers/virtio/virtio_pci_common.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Applied to noble:linux/master-next. Thanks.
-Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 48643 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240621/5c9920b4/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240621/5c9920b4/attachment-0001.sig>
More information about the kernel-team
mailing list