ACK: [SRU][F:linux-bluefield][PATCH v1 0/1] UBUNTU: SAUCE: Fix OOB handling RX packets in heavy traffic
Tim Gardner
tim.gardner at canonical.com
Mon Mar 21 12:03:06 UTC 2022
Acked-by: Tim Gardner <tim.gardner at canonical.com>
On 3/15/22 10:35, Asmaa Mnebhi wrote:
> BugLink: https://bugs.launchpad.net/bugs/1964984
>
> SRU Justification:
>
> [Impact]
>
> This is reproducible on systems which already have heavy background
> traffic. On top of that, the user issues one of the 2 docker pulls below:
> docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest
> OR
> docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa
>
> The second one is a very large container (17GB)
>
> When they run docker pull, the OOB interface stops being pingable,
> the docker pull is interrupted for a very long time (>3mn) or
> times out.
>
> [Fix]
>
> * Update the RX_CQE_CI before updating the RX_PI to avoid a race condition where we wrongly inform HW that there is space for the WQE.
> * disable the RX DMA while we are handling incoming packets to avoid overflow.
>
> [Test Case]
>
> * Created a script which loops 200 times and does a docker pull in each loop:
> docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest
> OR
> docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa
>
> [Regression Potential]
>
> * This could result in slower handling since we are disabling/enabling the DMA periodically.
> * Although this fix has been tested by the people who opened the bug, QA needs to thoroughly test it to make sure it is not reproducible.
>
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list