ACK/cmnt: [PATCH 0/2][SRU][B][C] i40e: Fix DCB and overlapping tx timeout issues

Kleber Souza kleber.souza at canonical.com
Tue Mar 26 11:29:49 UTC 2019


On 3/19/19 4:11 PM, Nivedita Singhvi wrote:
> BugLink: https://bugs.launchpad.net/bugs/1779756
> 
> [Impact]
> The i40e driver can get stalled on tx timeouts. This can happen when
> DCB is enabled on the connected switch. This can also trigger a
> second situation when a tx timeout occurs before the recovery of
> a previous timeout has completed due to CPU load, which is not
> handled correctly. This leads to networking delays, drops and
> application timeouts and hangs. Note that the first tx timeout
> cause is just one of the ways to end up in the second situation.
> 
> This issue was seen on a heavily loaded Kafka broker node running
> the 4.15.0-38-generic kernel on Xenial.
> 
> Symptoms include messages in the kernel log of the form:
> 
> ---
> [4733544.982116] i40e 0000:18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
> [4733544.982119] i40e 0000:18:00.1 eno2: tx_timeout recovery level 1, hung_queue 6
> ----
> 
> Fix
> ~~~~
> With the test kernel provided in this LP bug which had these
> two commits compiled in, the problem has not been seen again,
> and has been running successfully for several months:
> 
> "i40e: prevent overlapping tx_timeout recover"
> Commit: d5585b7b6846a6d0f9517afe57be3843150719da
> 
> "i40e: Fix for Tx timeouts when interface is brought up if
>  DCB is enabled"
> Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee
> 
> * The first commit is already in Disco
> * The second commit is already in Disco, Cosmic
> 
> So Bionic needs both patches and Cosmic only needs the first. 
> 
> [Test Case]
> * We are considering the case of both issues above occurring.
> * Seen by reporter on a Kafka broker node with heavy traffic.
> * Not easy to reproduce as it requires something like the
>   following example environment and heavy load:
> 
>   Kernel: 4.15.0-38-generic
>   Network driver: i40e
>         version: 2.1.14-k
>         firmware-version: 6.00 0x800034e6 18.3.6
>   NIC: Intel 40Gb XL710
>   DCB enabled
> 
> [Regression Potential]
> Low, as the first only impacts i40e DCB environment, and has
> been running for several months in production-load testing
> successfully.
> 
> Note: The first patch should be applied only to Cosmic.
> 
> Alan Brady (1):
>   i40e: prevent overlapping tx_timeout recover
> 
> Martyna Szapar (1):
>   i40e: Fix for Tx timeouts when interface is brought up if DCB is
>     enabled
> 
>  drivers/net/ethernet/intel/i40e/i40e.h      |  1 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 20 +++++++++++++-------
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 

The correct cherry pick provenance line is

"(cherry picked from commit ...)"

without the dash "-", as added by "git cherry-pick -x". This can be fixed
applying.

Apart from that it looks good. Clean cherry pick and extensively tested.

Acked-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>q



More information about the kernel-team mailing list