PATCH 0/1][Groovy] Backport mlx5e fix for tunnel offload

Tim Gardner tim.gardner at canonical.com
Fri Apr 9 12:31:56 UTC 2021


Note that this patch was previously submitted (and applied) to Focal.
Even though this driver likely has no commercial cloud consumers, its
worthwhile being complete.

Bionic is not affected since GENEVE is not supported.
Hirsute and linux-oem-5.10 already have the patch via stable.

[SRU Justification]

BugLink: https://bugs.launchpad.net/bugs/1921769

We've discovered an issue on Ubuntu 20.04 when used with Kubernetes CNIs that
perform offloading using Geneve that causes the kernel to panic on Azure
instances with accelerated networking with the following errors:

[ 307.561223] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 0x3d4, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
[ 307.573864] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
[ 307.764902] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 0x3d7, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
[ 307.777332] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
[ 322.814393] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x218, ci 0x1a7, sqn 0x2bd, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
[ 322.826685] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2bd

NVIDIA fixed this issue in https://github.com/torvalds/linux/commit/5ccc0ecda9e8a67add654d93d7e0ac4346c0fa22,
so we're looking to have this backported to at least the linux-azure package.

[Test Plan]
https://bugs.launchpad.net/ubuntu/bionic/+source/linux-azure/+bug/1921769/comments/6
(waiting on response, but now I've seen a 2nd request in LP#1922472)

[Where problems could occur]
Packets destined for offload acceleration could get dropped.

[Other Info]
Released in stable branches:
linux-5.10.y
linux-5.11.y






More information about the kernel-team mailing list