ACK/Cmnt: [PATCH v2 0/3][SRU][J] Intel E810 transmit hang with bonding enabled

Andrei Gherzan andrei.gherzan at canonical.com
Fri Jan 12 10:50:30 UTC 2024


On 24/01/11 04:03PM, Robert Malz wrote:
> BugLink: https://bugs.launchpad.net/bugs/2036239
> 
> [Impact]
>    * Issue is causing transmit hang on E810 ports with bonding enabled.
>    * Based on the provided logs, TX hang can last for even a couple of minutes, but in most scenarios, the network will be recovered after the ice driver performs a PF reset (TX hang handler routine).
>    * Originally, the issue was observed during Tempest tests on a newly created OpenStack cluster, resulting in a lack of certification.
> [Fix]
>   * Initially, a workaround has been proposed by Intel engineers to disable LAG initialization [1].
>     This change has been tested in an environment where reproduction is easily achieved.
>     After multiple iterations, no reproduction has been observed.
>   * Shortly after, Intel proposed a patch [2] to disable LAG initialization if NVM does not expose proper capabilities.
> [Test Plan]
>   * To reproduce the issue, over a 20-node cluster was used with Ceph-based storage. The problem could sometimes manifest while deploying a cluster or after the cluster was already deployed during the Tempest test run.
>   * The issue could appear on a random node, making reproduction hard to achieve.
>   * Multiple stress tests on single host with similar configuration did not trigger a reproduction.
> [Where problems could occur]
>   * All ice drivers with ice_lag_event_handler registered can expose the issue. This handler is not implemented in 20.04
>   * CVL4.2 and older NVM images for E810 does not expose SRIOV LAG capabilities (CVL4.3 wasn't checked) meaning at some point NVM with this capability will be released.
>     Although potentialy issue is caused by using features without proper FW support [2], we want to take a closer look once NVMs with proper support are introduced.
> [Other Information]
>   * Fix for described issue is implemented in patch 4d50fcdc2476eef94c14c6761073af5667bb43b6, however to apply it couple of 
>     additional changes have to be backported.
>     Upstream patch 40b247608bc50b5c046dfb1073c0ee7f57769c86 has been backported to add wrapper functions used to store feature capability.
>     Upstream patch bb52f42acef6ac317ee298d39909ce17bbaddb82 defines and initializes proper capabilities
> 
>   [1] - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036239/comments/40
>   [2] - https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20231211/038588.html 4d50fcdc2476eef94c14c6761073af5667bb43b6
> 
> 
> Anirudh Venkataramanan (1):
>   ice: Add feature bitmap, helpers and a check for DSCP
> 
> Dave Ertman (2):
>   ice: Add driver support for firmware changes for LAG
>   ice: alter feature support check for SRIOV and LAG
> 
>  drivers/net/ethernet/intel/ice/ice.h          |  8 ++++
>  .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  3 ++
>  drivers/net/ethernet/intel/ice/ice_common.c   |  8 ++++
>  drivers/net/ethernet/intel/ice/ice_lag.c      | 48 ++++++++++---------
>  drivers/net/ethernet/intel/ice/ice_lib.c      | 47 ++++++++++++++++++
>  drivers/net/ethernet/intel/ice/ice_lib.h      |  3 ++
>  drivers/net/ethernet/intel/ice/ice_main.c     |  2 +
>  drivers/net/ethernet/intel/ice/ice_type.h     |  2 +

The bug report should include the target series. Could you update it
accordingly?

Also, I would have benefited from a v2 description in the cover letter.

Acked-by: Andrei Gherzan <andrei.gherzan at canonical.com>

-- 
Andrei Gherzan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240112/52da1693/attachment.sig>


More information about the kernel-team mailing list