NAK: [PATCH 0/1][SRU][J] Intel E810 transmit hang with bonding enabled
Tim Gardner
tim.gardner at canonical.com
Mon Jan 8 15:04:53 UTC 2024
On 1/5/24 4:03 AM, Robert Malz wrote:
> BugLink: https://bugs.launchpad.net/bugs/2036239
>
> [Impact]
> * Issue is causing transmit hang on E810 ports with bonding enabled.
> * Based on the provided logs, TX hang can last for even a couple of minutes, but in most scenarios, the network will be recovered after the ice driver performs a PF reset (TX hang handler routine).
> * Originally, the issue was observed during Tempest tests on a newly created OpenStack cluster, resulting in a lack of certification.
> [Fix]
> * Initially, a workaround has been proposed by Intel engineers to disable LAG initialization [1].
> This change has been tested in an environment where reproduction is easily achieved.
> After multiple iterations, no reproduction has been observed.
> * Shortly after, Intel proposed a patch [2] to disable LAG initialization if NVM does not expose proper capabilities.
> [Test Plan]
> * To reproduce the issue, over a 20-node cluster was used with Ceph-based storage. The problem could sometimes manifest while deploying a cluster or after the cluster was already deployed during the Tempest test run.
> * The issue could appear on a random node, making reproduction hard to achieve.
> * Multiple stress tests on single host with similar configuration did not trigger a reproduction.
> [Where problems could occur]
> * All ice drivers with ice_lag_event_handler registered can expose the issue. This handler is not implemented in 20.04
> * CVL4.2 and older NVM images for E810 does not expose SRIOV LAG capabilities (CVL4.3 wasn't checked) meaning at some point NVM with this capability will be released.
> Although potentialy issue is caused by using features without proper FW support [2], we want to take a closer look once NVMs with proper support are introduced.
>
> [1] - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036239/comments/40
> [2] - https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20231211/038588.html 4d50fcdc2476eef94c14c6761073af5667bb43b6
>
> Dave Ertman (1):
> [SRU][J][PATCH 1/1] ice: alter feature support check for SRIOV and LAG
>
> drivers/net/ethernet/intel/ice/ice_adminq_cmd.h | 3 +++
> drivers/net/ethernet/intel/ice/ice_common.c | 8 ++++++++
> drivers/net/ethernet/intel/ice/ice_lag.c | 3 +++
> drivers/net/ethernet/intel/ice/ice_type.h | 2 ++
> 4 files changed, 16 insertions(+)
>
This appears to be a duplicate with no explanation.
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list