ACK/Cmnt: [PATCH 0/1] [SRU][X/B] i40e PF reset due to incorrect MDD event

Guilherme G. Piccoli gpiccoli at canonical.com
Thu Mar 4 20:15:26 UTC 2021


On 04/03/2021 16:51, Heitor Alves de Siqueira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1772675
> 
> [Impact]
> The i40e driver sometimes causes a "malicious device" event that the firmware
> detects, which causes the firmware to reset the NIC, causing an interruption in
> the network connection - which can cause further problems, e.g. if the interface
> is in a bond; the reset will at least cause a temporary interruption in network
> traffic.
> 
> [Fix]
> In the case of MDD events issued for the PF, they are usually the result of a
> misconfigured TX descriptor and not due to "bad" actions in the VFs. We don't
> need to issue a reset to the whole NIC, TX hang checks should handle those if
> necessary.
> 
> [Test Case]
> The bug is unfortunately difficult to reproduce, as there's no detailed
> documentation on how the i40e firmware detects and raises MDDs. We have seen
> reports of this happening in Xenial and Bionic, for workloads stressing i40e
> bonds in LACP mode.
> Reproducing is easily detected, as the network traffic will be interrupted and
> the system logs will contain a message like:
> i40e 0000:02:00.1: TX driver issue detected, PF reset issued
> 
> [Regression Potential]
> Since we're removing resets for the NIC, regressions could show up as issues in
> connectivity after the MDD events are raised. If the firmware expects the whole
> NIC to reset, we could see TX/RX hangs and general unresponsiveness in
> networking. The potential for this should however be fairly low, as this patch
> has been present since kernel 5.2 and hasn't seen any fixes or regressions
> upstream. Basic smoke tests also showed that the driver continues working as
> expected.
> 
> Carolyn Wyborny (1):
>   i40e: change behavior on PF in response to MDD event
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 


Thanks for the fix Heitor! It's very simple/clean, so:
Acked-by: Guilherme G. Piccoli <gpiccoli at canonical.com>

As curiosity, you marked both as backported patches - was it just
context adjustments?
Cheers,


Guilherme



More information about the kernel-team mailing list