ACK: [PATCH 0/2][SRU][B][C] i40e: Fix DCB and overlapping tx timeout issues
khalid.elmously at canonical.com
Wed Mar 27 06:05:38 UTC 2019
On 2019-03-19 20:41:55 , Nivedita Singhvi wrote:
> BugLink: https://bugs.launchpad.net/bugs/1779756
> The i40e driver can get stalled on tx timeouts. This can happen when
> DCB is enabled on the connected switch. This can also trigger a
> second situation when a tx timeout occurs before the recovery of
> a previous timeout has completed due to CPU load, which is not
> handled correctly. This leads to networking delays, drops and
> application timeouts and hangs. Note that the first tx timeout
> cause is just one of the ways to end up in the second situation.
> This issue was seen on a heavily loaded Kafka broker node running
> the 4.15.0-38-generic kernel on Xenial.
> Symptoms include messages in the kernel log of the form:
> [4733544.982116] i40e 0000:18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
> [4733544.982119] i40e 0000:18:00.1 eno2: tx_timeout recovery level 1, hung_queue 6
> With the test kernel provided in this LP bug which had these
> two commits compiled in, the problem has not been seen again,
> and has been running successfully for several months:
> "i40e: prevent overlapping tx_timeout recover"
> Commit: d5585b7b6846a6d0f9517afe57be3843150719da
> "i40e: Fix for Tx timeouts when interface is brought up if
> DCB is enabled"
> Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee
> * The first commit is already in Disco
> * The second commit is already in Disco, Cosmic
> So Bionic needs both patches and Cosmic only needs the first.
> [Test Case]
> * We are considering the case of both issues above occurring.
> * Seen by reporter on a Kafka broker node with heavy traffic.
> * Not easy to reproduce as it requires something like the
> following example environment and heavy load:
> Kernel: 4.15.0-38-generic
> Network driver: i40e
> version: 2.1.14-k
> firmware-version: 6.00 0x800034e6 18.3.6
> NIC: Intel 40Gb XL710
> DCB enabled
> [Regression Potential]
> Low, as the first only impacts i40e DCB environment, and has
> been running for several months in production-load testing
> Note: The first patch should be applied only to Cosmic.
> Alan Brady (1):
> i40e: prevent overlapping tx_timeout recover
> Martyna Szapar (1):
> i40e: Fix for Tx timeouts when interface is brought up if DCB is
> drivers/net/ethernet/intel/i40e/i40e.h | 1 +
> drivers/net/ethernet/intel/i40e/i40e_main.c | 20 +++++++++++++-------
> 2 files changed, 14 insertions(+), 7 deletions(-)
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
More information about the kernel-team