APPLIED: [SRU][U/O][N][PATCH 0/1] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads (LP: 2071471)

Stefan Bader stefan.bader at canonical.com
Thu Jul 4 18:58:15 UTC 2024


On 03.07.24 11:19, frank.heimes at canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/2071471
> 
> SRU Justification:
> 
> [Impact]
> 
>   * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"
>     (upstream with since kernel v6.7-rc1) there was a move (on s390x only)
>     to a different dma-iommu implementation.
> 
>   * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"
>     (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config
>     option should now be set to 'yes' by default for s390x.
> 
>   * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY
>     are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be
>     set to "no" by default, which was upstream done by b2b97a62f055
>     "Revert "s390: update defconfigs"".
> 
>   * These changes are all upstream, but were not picked up by the Ubuntu
>     kernel config.
> 
>   * And not having these config options set properly is causing significant
>     PCI-related network throughput degradation (up to -72%).
> 
>   * This shows for almost all workloads and numbers of connections,
>     deteriorating with the number of connections increasing.
> 
>   * Especially drastic is the drop for a high number of parallel connections
>     (50 and 250) and for small and medium-size transactional workloads.
>     However, also for streaming-type workloads the degradation is clearly
>     visible (up to 48% degradation).
> 
> [Fix]
> 
>   * The (upstream accepted) fix is to set
>     IOMMU_DEFAULT_DMA_STRICT=no
>     and
>     IOMMU_DEFAULT_DMA_LAZY=y
>     (which is needed for the changed DAM IOMMU implementation since v6.7).
> 
> [Test Case]
> 
>   * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8)
>     (one acting as server and as client)
>     that have (PCIe attached) RoCE Express devices attached
>     and that are connected to each other.
> 
>   * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:
>     <?xml version="1.0"?>
>     <profile name="TCP_RR">
>             <group nprocs="250">
>                     <transaction iterations="1">
>                             <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />
>                     </transaction>
>                     <transaction duration="300">
>                             <flowop type="write" options="size=200"/>
>                             <flowop type="read" options="size=1000"/>
>                     </transaction>
>                     <transaction iterations="1">
>                             <flowop type="disconnect" />
>                     </transaction>
>             </group>
>     </profile>
> 
>   * Install uperf on both systems, client and server.
> 
>   * Start uperf at server: uperf -s
> 
>   * Start uperf at client: uperf -vai 5 -m uperf-profile.xml
> 
>   * Switch from strict to lazy mode
>     either using the new kernel (or the test build below)
>     or using kernel cmd-line parameter iommu.strict=0.
> 
>   * Restart uperf on server and client, like before.
> 
>   * Verification will be performed by IBM.
> 
> [Regression Potential]
> 
>   * The is a certain regression potential, since the behavior with
>     the two modified kernel config options will change significantly.
> 
>   * This may solve the (network) throughput issue with PCI devices,
>     but may also come with side-effects on other PCIe based devices
>     (the old compression adapters or the new NVMe carrier cards).
> 
> [Other]
> 
>   * CCW devices are not affected.
> 
>   * This is s390x-specific only, hence will not affect any other architecture.
> 
> Frank Heimes (1):
>    UBUNTU: [Config] Set IOMMU_DEFAULT_DMA_STRICT=n and
>      IOMMU_DEFAULT_DMA_LAZY=y for s390x
> 
>   debian.master/config/annotations | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 

Applied to noble:linux/master-next. Thanks.

-Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 48643 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240704/e8d539af/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240704/e8d539af/attachment-0001.sig>


More information about the kernel-team mailing list