APPLIED[U]: Re: [SRU][U/O][N][PATCH 0/1] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads (LP: 2071471)
Paolo Pisati
paolo.pisati at canonical.com
Wed Jul 3 15:38:07 UTC 2024
On Wed, Jul 03, 2024 at 11:19:48AM +0200, frank.heimes at canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/2071471
>
> SRU Justification:
>
> [Impact]
>
> * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"
> (upstream with since kernel v6.7-rc1) there was a move (on s390x only)
> to a different dma-iommu implementation.
>
> * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"
> (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config
> option should now be set to 'yes' by default for s390x.
>
> * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY
> are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be
> set to "no" by default, which was upstream done by b2b97a62f055
> "Revert "s390: update defconfigs"".
>
> * These changes are all upstream, but were not picked up by the Ubuntu
> kernel config.
>
> * And not having these config options set properly is causing significant
> PCI-related network throughput degradation (up to -72%).
>
> * This shows for almost all workloads and numbers of connections,
> deteriorating with the number of connections increasing.
>
> * Especially drastic is the drop for a high number of parallel connections
> (50 and 250) and for small and medium-size transactional workloads.
> However, also for streaming-type workloads the degradation is clearly
> visible (up to 48% degradation).
>
> [Fix]
>
> * The (upstream accepted) fix is to set
> IOMMU_DEFAULT_DMA_STRICT=no
> and
> IOMMU_DEFAULT_DMA_LAZY=y
> (which is needed for the changed DAM IOMMU implementation since v6.7).
>
> [Test Case]
>
> * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8)
> (one acting as server and as client)
> that have (PCIe attached) RoCE Express devices attached
> and that are connected to each other.
>
> * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:
> <?xml version="1.0"?>
> <profile name="TCP_RR">
> <group nprocs="250">
> <transaction iterations="1">
> <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />
> </transaction>
> <transaction duration="300">
> <flowop type="write" options="size=200"/>
> <flowop type="read" options="size=1000"/>
> </transaction>
> <transaction iterations="1">
> <flowop type="disconnect" />
> </transaction>
> </group>
> </profile>
>
> * Install uperf on both systems, client and server.
>
> * Start uperf at server: uperf -s
>
> * Start uperf at client: uperf -vai 5 -m uperf-profile.xml
>
> * Switch from strict to lazy mode
> either using the new kernel (or the test build below)
> or using kernel cmd-line parameter iommu.strict=0.
>
> * Restart uperf on server and client, like before.
>
> * Verification will be performed by IBM.
>
> [Regression Potential]
>
> * The is a certain regression potential, since the behavior with
> the two modified kernel config options will change significantly.
>
> * This may solve the (network) throughput issue with PCI devices,
> but may also come with side-effects on other PCIe based devices
> (the old compression adapters or the new NVMe carrier cards).
>
> [Other]
>
> * CCW devices are not affected.
>
> * This is s390x-specific only, hence will not affect any other architecture.
>
> Frank Heimes (1):
> UBUNTU: [Config] Set IOMMU_DEFAULT_DMA_STRICT=n and
> IOMMU_DEFAULT_DMA_LAZY=y for s390x
>
> debian.master/config/annotations | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> --
> 2.43.0
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
--
bye,
p.
More information about the kernel-team
mailing list