APPLIED[U]: Re: [SRU][U/O][N][PATCH 0/1] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads (LP: 2071471)

Paolo Pisati paolo.pisati at canonical.com
Wed Jul 3 15:38:07 UTC 2024


On Wed, Jul 03, 2024 at 11:19:48AM +0200, frank.heimes at canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/2071471
> 
> SRU Justification:
> 
> [Impact]
> 
>  * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"
>    (upstream with since kernel v6.7-rc1) there was a move (on s390x only)
>    to a different dma-iommu implementation.
> 
>  * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"
>    (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config
>    option should now be set to 'yes' by default for s390x.
> 
>  * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY
>    are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be
>    set to "no" by default, which was upstream done by b2b97a62f055
>    "Revert "s390: update defconfigs"".
> 
>  * These changes are all upstream, but were not picked up by the Ubuntu
>    kernel config.
> 
>  * And not having these config options set properly is causing significant
>    PCI-related network throughput degradation (up to -72%).
> 
>  * This shows for almost all workloads and numbers of connections,
>    deteriorating with the number of connections increasing.
> 
>  * Especially drastic is the drop for a high number of parallel connections
>    (50 and 250) and for small and medium-size transactional workloads.
>    However, also for streaming-type workloads the degradation is clearly
>    visible (up to 48% degradation).
> 
> [Fix]
> 
>  * The (upstream accepted) fix is to set
>    IOMMU_DEFAULT_DMA_STRICT=no
>    and
>    IOMMU_DEFAULT_DMA_LAZY=y
>    (which is needed for the changed DAM IOMMU implementation since v6.7).
> 
> [Test Case]
> 
>  * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8)
>    (one acting as server and as client)
>    that have (PCIe attached) RoCE Express devices attached
>    and that are connected to each other.
> 
>  * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:
>    <?xml version="1.0"?>
>    <profile name="TCP_RR">
>            <group nprocs="250">
>                    <transaction iterations="1">
>                            <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />
>                    </transaction>
>                    <transaction duration="300">
>                            <flowop type="write" options="size=200"/>
>                            <flowop type="read" options="size=1000"/>
>                    </transaction>
>                    <transaction iterations="1">
>                            <flowop type="disconnect" />
>                    </transaction>
>            </group>
>    </profile>
> 
>  * Install uperf on both systems, client and server.
> 
>  * Start uperf at server: uperf -s
> 
>  * Start uperf at client: uperf -vai 5 -m uperf-profile.xml
> 
>  * Switch from strict to lazy mode 
>    either using the new kernel (or the test build below)
>    or using kernel cmd-line parameter iommu.strict=0.
> 
>  * Restart uperf on server and client, like before.
> 
>  * Verification will be performed by IBM.
> 
> [Regression Potential]
> 
>  * The is a certain regression potential, since the behavior with
>    the two modified kernel config options will change significantly.
> 
>  * This may solve the (network) throughput issue with PCI devices,
>    but may also come with side-effects on other PCIe based devices
>    (the old compression adapters or the new NVMe carrier cards).
> 
> [Other]
> 
>  * CCW devices are not affected.
> 
>  * This is s390x-specific only, hence will not affect any other architecture.
> 
> Frank Heimes (1):
>   UBUNTU: [Config] Set IOMMU_DEFAULT_DMA_STRICT=n and
>     IOMMU_DEFAULT_DMA_LAZY=y for s390x
> 
>  debian.master/config/annotations | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> -- 
> 2.43.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

-- 
bye,
p.



More information about the kernel-team mailing list