ACK: [SRU][F:linux-bluefield][PATCH v1 1/1] UBUNTU: SAUCE: Fix OOB handling RX packets in heavy traffic

Stefan Bader stefan.bader at canonical.com
Fri Mar 18 13:28:03 UTC 2022


On 15.03.22 17:35, Asmaa Mnebhi wrote:
> BugLink: https://bugs.launchpad.net/bugs/1964984
> 
> This is reproducible on systems which already have heavy background
> traffic. On top of that, the user issues one of the 2 docker pulls below:
> docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest
> OR
> docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa
> 
> The second one is a very large container (17GB)
> 
> When they run docker pull, the OOB interface stops being pingable,
> the docker pull is interrupted for a very long time (>3mn) or
> times out.
> 
> The main reason for the above is because RX PI = RX CI. I have verified that
> by reading RX_CQE_PACKET_CI and RX_WQE_PI. This means the WQEs are full and
> HW has nowhere else to put the RX packets.
> 
> I believe there is a race condition after SW receives a RX interrupt,
> and the interrupt is disabled. I believe HW still tries to add RX
> packets to the RX WQEs. So we need to stop the RX traffic by disabling
> the DMA. Also, move reading the RX CI before writing the increased value
> of RX PI to MLXBF_GIGE_RX_WQE_PI. Normally RX PI should always be > RX CI.
> I suspect that when entering mlxbf_gige_rx_packet, for example we have:
> MLXBF_GIGE_RX_WQE_PI = 128
> RX_CQE_PACKET_CI = 128
> (128 being the max size of the WQE)
> 
> Then this code will make MLXBF_GIGE_RX_WQE_PI = 129:
> rx_pi++;
> /* Ensure completion of all writes before notifying HW of replenish */
> wmb();
> writeq(rx_pi, priv->base + MLXBF_GIGE_RX_WQE_PI);
> 
> which means HW has one more slot to populate and in that time span, the HW
> populates that WQE and increases the RX_CQE_PACKET_CI = 129.
> 
> Then this code is subject to a race condition:
> 
> rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
> rx_ci_rem = rx_ci % priv->rx_q_entries;
> return rx_pi_rem != rx_ci_rem;
> 
> because rx_pi_rem will be equal to rx_ci_rem.
> so remaining_pkts will be 0 and we will exit mlxbf_gige_poll.
> 
> Signed-off-by: Asmaa Mnebhi <asmaa at nvidia.com>
> Reviewed-by: David Thompson <davthompson at nvidia.com>
> Signed-off-by: Asmaa Mnebhi <asmaa at nvidia.com>
Acked-by: Stefan Bader <stefan.bader at canonical.com>
> 
> ---
>   .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c  |  2 +-
>   .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c    | 13 +++++++++++--
>   2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> index d0871014d4e9..9ef883b90aee 100644
> --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> @@ -20,7 +20,7 @@
>   #include "mlxbf_gige_regs.h"
>   
>   #define DRV_NAME    "mlxbf_gige"
> -#define DRV_VERSION 1.25
> +#define DRV_VERSION 1.26
>   
>   /* This setting defines the version of the ACPI table
>    * content that is compatible with this driver version.
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
> index afa3b92a6905..96230763cf6c 100644
> --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
> @@ -266,6 +266,9 @@ static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
>   		priv->stats.rx_truncate_errors++;
>   	}
>   
> +	rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
> +	rx_ci_rem = rx_ci % priv->rx_q_entries;
> +
>   	/* Let hardware know we've replenished one buffer */
>   	rx_pi++;
>   
> @@ -278,8 +281,6 @@ static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
>   	rx_pi_rem = rx_pi % priv->rx_q_entries;
>   	if (rx_pi_rem == 0)
>   		priv->valid_polarity ^= 1;
> -	rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
> -	rx_ci_rem = rx_ci % priv->rx_q_entries;
>   
>   	if (skb)
>   		netif_receive_skb(skb);
> @@ -299,6 +300,10 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget)
>   
>   	mlxbf_gige_handle_tx_complete(priv);
>   
> +	data = readq(priv->base + MLXBF_GIGE_RX_DMA);
> +	data &= ~MLXBF_GIGE_RX_DMA_EN;
> +	writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
> +
>   	do {
>   		remaining_pkts = mlxbf_gige_rx_packet(priv, &work_done);
>   	} while (remaining_pkts && work_done < budget);
> @@ -314,6 +319,10 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget)
>   		data = readq(priv->base + MLXBF_GIGE_INT_MASK);
>   		data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
>   		writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
> +
> +		data = readq(priv->base + MLXBF_GIGE_RX_DMA);
> +		data |= MLXBF_GIGE_RX_DMA_EN;
> +		writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
>   	}
>   
>   	return work_done;

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20220318/822ccec7/attachment.sig>


More information about the kernel-team mailing list