APPLIED: [SRU][F:linux-bluefield][PATCH v1 0/1] UBUNTU: SAUCE: Fix OOB handling RX packets in heavy traffic

Tim Gardner tim.gardner at canonical.com
Fri Apr 1 14:31:24 UTC 2022


Applied to focal/linux-bluefield:master-next. Thanks.

-rtg

On 3/15/22 10:35, Asmaa Mnebhi wrote:
> BugLink: https://bugs.launchpad.net/bugs/1964984
> 
> SRU Justification:
> 
> [Impact]
> 
> This is reproducible on systems which already have heavy background
> traffic. On top of that, the user issues one of the 2 docker pulls below:
> docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest
> OR
> docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa
> 
> The second one is a very large container (17GB)
> 
> When they run docker pull, the OOB interface stops being pingable,
> the docker pull is interrupted for a very long time (>3mn) or
> times out.
> 
> [Fix]
> 
> * Update the RX_CQE_CI before updating the RX_PI to avoid a race condition where we wrongly inform HW that there is space for the WQE.
> * disable the RX DMA while we are handling incoming packets to avoid overflow.
> 
> [Test Case]
> 
> * Created a script which loops 200 times and does a docker pull in each loop:
> docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest
> OR
> docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa
> 
> [Regression Potential]
> 
> * This could result in slower handling since we are disabling/enabling the DMA periodically.
> * Although this fix has been tested by the people who opened the bug, QA needs to thoroughly test it to make sure it is not reproducible.
> 

-- 
-----------
Tim Gardner
Canonical, Inc



More information about the kernel-team mailing list