ACK/Cmnt: [SRU B/D][SRU X][Unstable][PATCH 0/1] bnx2x: avoid 100% cpu utilization from ptp routine
stefan.bader at canonical.com
Wed Jul 10 07:41:51 UTC 2019
On 03.07.19 21:17, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1832082
> * The PTP feature in bnx2x driver is implemented in a way that if the NIC
> firmware takes some time to perform the timestamping - which is observed as a
> bad register read in bnx2x_ptp_task() - then the ptp worker function will
> reschedule itself indefinitely until the value read from the register is
> meaningful. With that behavior, if an userspace tool request a bad configured
> RX filter to bnx2x (or if NIC firmware has any other issue in timestamping),
> the function bnx2x_ptp_task() will be rescheduled forever and cause a unbound
> resource consumption. This manifests as a kworker thread consuming 100% of CPU.
> * The dmesg log will show the following message regarding other packets being
> skipped on timestamp routine due to a packet getting stuck in the timestamping
> "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
> outstanding packet to timestamp, this packet will not be timestamped"
> Also, by using ftrace user can notice that function bnx2x_ptp_task() is being
> called a lot, and by enabling bnx2x PTP debugging log (ethtool -s <iface> msglvl
> 16777216) it's possible to observe the following message flooding the kernel
> "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet"
> * The patch proposed in this SRU request is accepted upstream and is available
> currently (2019-07-03) in David Miller's linux-net tree:
> Besides fixing the issue, it also adds an ethtool statistics for accounting the
> ptp errors and reduces message flooding in case of errors.
> [Test case]
> Reproducing the problem is not difficult; we've used chrony in Bionic to trigger
> the problem. The steps are:
> a) Install chrony on Bionic in a system with working NIC managed by bnx2x;
> b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf
> c) Restart chrony service
> Check dmesg for the "[...]single outstanding packet" message and the overall CPU
> workload using a tool like "top" to observe a kthread consuming 100% of CPU.
> [Regression potential]
> The patch scope is restricted to bnx2x ptp handler, and was validated by the
> driver maintainer. If there's any possibility of regressions, we believe the
> worst would be an issue affecting the packet timestamping, not messing with the
> regular xmit path for the driver.
> Guilherme G. Piccoli (1):
> bnx2x: Prevent ptp_task to be rescheduled indefinitely
> .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 5 ++-
> .../ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 4 ++-
> .../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 33 ++++++++++++++-----
> .../net/ethernet/broadcom/bnx2x/bnx2x_stats.h | 3 ++
> 4 files changed, 34 insertions(+), 11 deletions(-)
Cosmic goes EOL before next cycle, so NACK for Cosmic, the rest:
Acked-by: Stefan Bader <stefan.bader at canonical.com>
More information about the kernel-team