ACK: Re: [SRU][Noble][PATCH 0/1] Removing legacy virtio-pci devices causes kernel panic

Chris Chiu chris.chiu at canonical.com
Tue Jun 18 15:25:15 UTC 2024


On Tue, Jun 18, 2024 at 1:29 PM Matthew Ruffell
<matthew.ruffell at canonical.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/2067862
>
> [Impact]
>
> If you detach a legacy virtio-pci device from a current Noble system, it will
> cause a null pointer dereference, and panic the system. This is an issue if you
> force noble to use legacy virtio-pci devices, or run noble on very old
> hypervisors that only support legacy virtio-pci devices, e.g. trusty and older.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:0x0
> ...
> Call Trace:
> <TASK>
>  ? show_regs+0x6d/0x80
>  ? __die+0x24/0x80
>  ? page_fault_oops+0x99/0x1b0
>  ? do_user_addr_fault+0x2ee/0x6b0
>  ? exc_page_fault+0x83/0x1b0
>  ? asm_exc_page_fault+0x27/0x30
>  vp_del_vqs+0x6e/0x2a0
>  remove_vq_common+0x166/0x1a0
>  virtnet_remove+0x61/0x80
>  virtio_dev_remove+0x3f/0xc0
>  device_remove+0x40/0x80
>  device_release_driver_internal+0x20b/0x270
>  device_release_driver+0x12/0x20
>  bus_remove_device+0xcb/0x140
>  device_del+0x161/0x3e0
>  ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0
>  device_unregister+0x17/0x60
>  unregister_virtio_device+0x16/0x40
>  virtio_pci_remove+0x43/0xa0
>  pci_device_remove+0x36/0xb0
>  device_remove+0x40/0x80
>  device_release_driver_internal+0x20b/0x270
>  device_release_driver+0x12/0x20
>  pci_stop_bus_device+0x7a/0xb0
>  pci_stop_and_remove_bus_device+0x12/0x30
>  disable_slot+0x4f/0xa0
>  acpiphp_disable_and_eject_slot+0x1c/0xa0
>  hotplug_event+0x11b/0x280
>  ? __pfx_acpiphp_hotplug_notify+0x10/0x10
>  acpiphp_hotplug_notify+0x27/0x70
>  acpi_device_hotplug+0xb6/0x300
>  acpi_hotplug_work_fn+0x1e/0x40
>  process_one_work+0x16c/0x350
>  worker_thread+0x306/0x440
>  ? _raw_spin_lock_irqsave+0xe/0x20
>  ? __pfx_worker_thread+0x10/0x10
>  kthread+0xef/0x120
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x44/0x70
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
> </TASK>
>
> The issue was introduced in:
>
> commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a
> Author: Feng Liu <feliu at nvidia.com>
> Date:   Tue Dec 19 11:32:40 2023 +0200
> Subject: virtio-pci: Introduce admin virtqueue
> Link: https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a
>
> Modern virtio-pci devices are not affected. If the device is a legacy virtio
> device, the is_avq function pointer is not assigned in the virtio_pci_device
> structure of the legacy virtio device, resulting in a NULL pointer dereference
> when the code calls if (vp_dev->is_avq(vdev, vq->index)).
>
> There is no workaround. If you are affected, then not detaching devices for the
> time being is the only solution.
>
> [Fix]
>
> This was fixed in 6.9-rc1 by:
>
> commit c8fae27d141a32a1624d0d0d5419d94252824498
> From: Li Zhang <zhanglikernel at gmail.com>
> Date: Sat, 16 Mar 2024 13:25:54 +0800
> Subject: virtio-pci: Check if is_avq is NULL
> Link: https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498
>
> This is a clean cherry pick to noble. The commit just adds a basic NULL pointer
> check before it dereferences the pointer.
>
> [Testcase]
>
> Start a fresh Noble VM.
>
> Edit the grub kernel command line:
>
> 1) sudo vim /etc/default/grub
> GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1"
> 2) sudo update-grub
> 3) sudo reboot
>
> Outside the VM, on the host:
>
> $ qemu-img create -f qcow2 /root/share-device.qcow2 2G
> $ cat >> share-device.xml << EOF
> disk type='file' device='disk'>
>     <driver name='qemu' type='qcow2' cache='writeback' io='threads'/>
>     <source file='/root/share-device.qcow2'/>
>     <target dev='vdc' bus='virtio'/>
> </disk>
> EOF
> $ sudo -s
> # virsh attach-device noble-test share-device.xml --config --live
> # virsh detach-device noble-test share-device.xml --config --live
>
> A kernel panic should occur.
>
> There is a test kernel available in:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test
>
> If you install it, the panic should no longer occur.
>
> [Where problems could occur]
>
> We are adding a basic null pointer check right before the pointer is about to be
> used, which is quite low risk.
>
> If a regression were to occur, it would only affect VMs using legacy virtio-pci
> devices, which is not the default. It would potentially have large impacts on
> fleets of very old hypervisors running trusty, precise or lucid, but that is
> very unlikely in this day and age.
>
> [Other Info]
>
> Upstream mailing list discussion and author testcase:
> https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=peVEES6Xdw1zSsJkb-bc9Etx9Q@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7
>
> Li Zhang (1):
>   virtio-pci: Check if is_avq is NULL
>
>  drivers/virtio/virtio_pci_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --
> 2.40.1
>

Acked-by: Chris Chiu <chris.chiu at canonical.com>

>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list