[SRU][F][PATCH 0/2] CVE-2024-26891
Koichiro Den
koichiro.den at canonical.com
Fri Sep 6 13:28:29 UTC 2024
[Impact]
iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
For those endpoint devices connect to system via hotplug capable ports,
users could request a hot reset to the device by flapping device's link
through setting the slot's link control register, as pciehp_ist() DLLSC
interrupt sequence response, pciehp will unload the device driver and
then power it off. thus cause an IOMMU device-TLB invalidation (Intel
VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
target device to be sent and deadly loop to retry that request after ITE
fault triggered in interrupt context.
That would cause continuous hard lockup warning and system hang.
Such issue could be triggered by all kinds of regular surprise removal
hotplug operation. like:
1. pull EP(endpoint device) out directly.
2. turn off EP's power.
3. bring the link down.
etc.
this patch aims to work for regular safe removal and surprise removal
unplug. these hot unplug handling process could be optimized for fix the
ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
function devtlb_invalidation_with_pasid() to check target device state to
avoid sending meaningless ATS Invalidation request to iommu when device is
gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
For safe removal, device wouldn't be removed until the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS Invalidation timeout wait. In safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.
For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS Invalidation request to
the disconnected device blindly, thus avoid to trigger further ITE fault,
and ITE fault will block all invalidation request to be handled.
furthermore retry the timeout request could trigger hard lockup.
safe removal (present) & surprise removal (not present)
pciehp_ist()
pciehp_handle_presence_or_link_change()
pciehp_disable_slot()
remove_board()
pciehp_unconfigure_device(presence) {
if (!presence)
pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
}
this patch works for regular safe removal and surprise removal of ATS
capable endpoint on PCIe switch downstream ports.
[Backport]
To backport the main patch, the pci_dev_is_disconnected() helper needs
to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make
pci_dev_is_disconnected() helper public for other drivers").
Additionally, context adjustment were needed due to missing commit
672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory")
[Fix]
Noble: fixed via stable
Jammy: fixed via stable
Focal: Backport - adjusted contexts due to missing commits, see [Backport]
Bionic: not affected
Xenial: not affected
Trusty: not affected
[Test Case]
Compile and boot tested
[Where problems could occur]
This fix potentially impacts intel architectures where an IOMMU capable
of SM address translation is active, an issue with this fix would induce
never succeeding device-TLB invalidation against no longer existing
endpoint after its surprise removal, leading to hard lockup and system
hang.
Ethan Zhao (2):
PCI: Make pci_dev_is_disconnected() helper public for other drivers
iommu/vt-d: Don't issue ATS Invalidation request when device is
disconnected
drivers/iommu/intel-pasid.c | 3 +++
drivers/pci/pci.h | 5 -----
include/linux/pci.h | 5 +++++
3 files changed, 8 insertions(+), 5 deletions(-)
--
2.43.0
More information about the kernel-team
mailing list