ACK/Cmnt: [PATCH 0/1] [SRU][X/B] i40e PF reset due to incorrect MDD event

Guilherme Piccoli gpiccoli at canonical.com
Thu Mar 4 20:52:13 UTC 2021


Thanks Heitor, I don't see a reason for that. If you needed to change
the patch (even offsets), I guess "backported" is totally fine, makes
more sense than "cherry-pick" IMHO heheh

Cheers!

On Thu, Mar 4, 2021 at 5:41 PM Heitor Alves de Siqueira
<halves at canonical.com> wrote:
>
> Thanks for the ack, Guilherme!
>
> The upstream patch didn't cherry-pick cleanly on Xenial, so I had to do some
> context adjustments indeed. I think for Bionic it needs only offset adjustments,
> but it could be cherry-picked directly. Would you like me to resubmit that one
> with a "cherry-picked from" tag?
>
> Cheers,
> Heitor
>
>
> On Thu, Mar 4, 2021 at 5:15 PM Guilherme G. Piccoli
> <gpiccoli at canonical.com> wrote:
> >
> > On 04/03/2021 16:51, Heitor Alves de Siqueira wrote:
> > > BugLink: https://bugs.launchpad.net/bugs/1772675
> > >
> > > [Impact]
> > > The i40e driver sometimes causes a "malicious device" event that the firmware
> > > detects, which causes the firmware to reset the NIC, causing an interruption in
> > > the network connection - which can cause further problems, e.g. if the interface
> > > is in a bond; the reset will at least cause a temporary interruption in network
> > > traffic.
> > >
> > > [Fix]
> > > In the case of MDD events issued for the PF, they are usually the result of a
> > > misconfigured TX descriptor and not due to "bad" actions in the VFs. We don't
> > > need to issue a reset to the whole NIC, TX hang checks should handle those if
> > > necessary.
> > >
> > > [Test Case]
> > > The bug is unfortunately difficult to reproduce, as there's no detailed
> > > documentation on how the i40e firmware detects and raises MDDs. We have seen
> > > reports of this happening in Xenial and Bionic, for workloads stressing i40e
> > > bonds in LACP mode.
> > > Reproducing is easily detected, as the network traffic will be interrupted and
> > > the system logs will contain a message like:
> > > i40e 0000:02:00.1: TX driver issue detected, PF reset issued
> > >
> > > [Regression Potential]
> > > Since we're removing resets for the NIC, regressions could show up as issues in
> > > connectivity after the MDD events are raised. If the firmware expects the whole
> > > NIC to reset, we could see TX/RX hangs and general unresponsiveness in
> > > networking. The potential for this should however be fairly low, as this patch
> > > has been present since kernel 5.2 and hasn't seen any fixes or regressions
> > > upstream. Basic smoke tests also showed that the driver continues working as
> > > expected.
> > >
> > > Carolyn Wyborny (1):
> > >   i40e: change behavior on PF in response to MDD event
> > >
> > >  drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++----------
> > >  1 file changed, 2 insertions(+), 10 deletions(-)
> > >
> >
> >
> > Thanks for the fix Heitor! It's very simple/clean, so:
> > Acked-by: Guilherme G. Piccoli <gpiccoli at canonical.com>
> >
> > As curiosity, you marked both as backported patches - was it just
> > context adjustments?
> > Cheers,
> >
> >
> > Guilherme



More information about the kernel-team mailing list