[SRU B/C/D][SRU X][Unstable][PATCH 0/1] bnx2x: avoid 100% cpu utilization from ptp routine

Guilherme G. Piccoli gpiccoli at canonical.com
Wed Jul 3 19:17:52 UTC 2019


BugLink: https://bugs.launchpad.net/bugs/1832082

[Impact]

* The PTP feature in bnx2x driver is implemented in a way that if the NIC
firmware takes some time to perform the timestamping - which is observed as a
bad register read in bnx2x_ptp_task() - then the ptp worker function will
reschedule itself indefinitely until the value read from the register is
meaningful. With that behavior, if an userspace tool request a bad configured
RX filter to bnx2x (or if NIC firmware has any other issue in timestamping),
the function bnx2x_ptp_task() will be rescheduled forever and cause a unbound
resource consumption. This manifests as a kworker thread consuming 100% of CPU.

* The dmesg log will show the following message regarding other packets being
skipped on timestamp routine due to a packet getting stuck in the timestamping
"pipeline":

"bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
outstanding packet to timestamp, this packet will not be timestamped"

Also, by using ftrace user can notice that function bnx2x_ptp_task() is being
called a lot, and by enabling bnx2x PTP debugging log (ethtool -s <iface> msglvl
16777216) it's possible to observe the following message flooding the kernel
log:

"bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet"

* The patch proposed in this SRU request is accepted upstream and is available
currently (2019-07-03) in David Miller's linux-net tree:
git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
Besides fixing the issue, it also adds an ethtool statistics for accounting the
ptp errors and reduces message flooding in case of errors.


[Test case]

Reproducing the problem is not difficult; we've used chrony in Bionic to trigger
the problem. The steps are:

a) Install chrony on Bionic in a system with working NIC managed by bnx2x;

b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf
file;

c) Restart chrony service

Check dmesg for the "[...]single outstanding packet" message and the overall CPU
workload using a tool like "top" to observe a kthread consuming 100% of CPU.


[Regression potential]

The patch scope is restricted to bnx2x ptp handler, and was validated by the
driver maintainer. If there's any possibility of regressions, we believe the
worst would be an issue affecting the packet timestamping, not messing with the
regular xmit path for the driver.

Guilherme G. Piccoli (1):
  bnx2x: Prevent ptp_task to be rescheduled indefinitely

 .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |  5 ++-
 .../ethernet/broadcom/bnx2x/bnx2x_ethtool.c   |  4 ++-
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  | 33 ++++++++++++++-----
 .../net/ethernet/broadcom/bnx2x/bnx2x_stats.h |  3 ++
 4 files changed, 34 insertions(+), 11 deletions(-)

-- 
2.22.0




More information about the kernel-team mailing list