ACK/Cmnt: [PATCH 0/1] [SRU][X/B] i40e PF reset due to incorrect MDD event
Heitor Alves de Siqueira
halves at canonical.com
Thu Mar 4 20:41:42 UTC 2021
Thanks for the ack, Guilherme!
The upstream patch didn't cherry-pick cleanly on Xenial, so I had to do some
context adjustments indeed. I think for Bionic it needs only offset adjustments,
but it could be cherry-picked directly. Would you like me to resubmit that one
with a "cherry-picked from" tag?
Cheers,
Heitor
On Thu, Mar 4, 2021 at 5:15 PM Guilherme G. Piccoli
<gpiccoli at canonical.com> wrote:
>
> On 04/03/2021 16:51, Heitor Alves de Siqueira wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1772675
> >
> > [Impact]
> > The i40e driver sometimes causes a "malicious device" event that the firmware
> > detects, which causes the firmware to reset the NIC, causing an interruption in
> > the network connection - which can cause further problems, e.g. if the interface
> > is in a bond; the reset will at least cause a temporary interruption in network
> > traffic.
> >
> > [Fix]
> > In the case of MDD events issued for the PF, they are usually the result of a
> > misconfigured TX descriptor and not due to "bad" actions in the VFs. We don't
> > need to issue a reset to the whole NIC, TX hang checks should handle those if
> > necessary.
> >
> > [Test Case]
> > The bug is unfortunately difficult to reproduce, as there's no detailed
> > documentation on how the i40e firmware detects and raises MDDs. We have seen
> > reports of this happening in Xenial and Bionic, for workloads stressing i40e
> > bonds in LACP mode.
> > Reproducing is easily detected, as the network traffic will be interrupted and
> > the system logs will contain a message like:
> > i40e 0000:02:00.1: TX driver issue detected, PF reset issued
> >
> > [Regression Potential]
> > Since we're removing resets for the NIC, regressions could show up as issues in
> > connectivity after the MDD events are raised. If the firmware expects the whole
> > NIC to reset, we could see TX/RX hangs and general unresponsiveness in
> > networking. The potential for this should however be fairly low, as this patch
> > has been present since kernel 5.2 and hasn't seen any fixes or regressions
> > upstream. Basic smoke tests also showed that the driver continues working as
> > expected.
> >
> > Carolyn Wyborny (1):
> > i40e: change behavior on PF in response to MDD event
> >
> > drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++----------
> > 1 file changed, 2 insertions(+), 10 deletions(-)
> >
>
>
> Thanks for the fix Heitor! It's very simple/clean, so:
> Acked-by: Guilherme G. Piccoli <gpiccoli at canonical.com>
>
> As curiosity, you marked both as backported patches - was it just
> context adjustments?
> Cheers,
>
>
> Guilherme
More information about the kernel-team
mailing list