APPLIED[E]: [SRU B/C/D][SRU X][Unstable][PATCH 0/1] bnx2x: avoid 100% cpu utilization from ptp routine
Seth Forshee
seth.forshee at canonical.com
Wed Jul 17 16:09:10 UTC 2019
On Wed, Jul 03, 2019 at 04:17:52PM -0300, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1832082
>
> [Impact]
>
> * The PTP feature in bnx2x driver is implemented in a way that if the NIC
> firmware takes some time to perform the timestamping - which is observed as a
> bad register read in bnx2x_ptp_task() - then the ptp worker function will
> reschedule itself indefinitely until the value read from the register is
> meaningful. With that behavior, if an userspace tool request a bad configured
> RX filter to bnx2x (or if NIC firmware has any other issue in timestamping),
> the function bnx2x_ptp_task() will be rescheduled forever and cause a unbound
> resource consumption. This manifests as a kworker thread consuming 100% of CPU.
>
> * The dmesg log will show the following message regarding other packets being
> skipped on timestamp routine due to a packet getting stuck in the timestamping
> "pipeline":
>
> "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
> outstanding packet to timestamp, this packet will not be timestamped"
>
> Also, by using ftrace user can notice that function bnx2x_ptp_task() is being
> called a lot, and by enabling bnx2x PTP debugging log (ethtool -s <iface> msglvl
> 16777216) it's possible to observe the following message flooding the kernel
> log:
>
> "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet"
>
> * The patch proposed in this SRU request is accepted upstream and is available
> currently (2019-07-03) in David Miller's linux-net tree:
> git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
> Besides fixing the issue, it also adds an ethtool statistics for accounting the
> ptp errors and reduces message flooding in case of errors.
>
>
> [Test case]
>
> Reproducing the problem is not difficult; we've used chrony in Bionic to trigger
> the problem. The steps are:
>
> a) Install chrony on Bionic in a system with working NIC managed by bnx2x;
>
> b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf
> file;
>
> c) Restart chrony service
>
> Check dmesg for the "[...]single outstanding packet" message and the overall CPU
> workload using a tool like "top" to observe a kthread consuming 100% of CPU.
>
>
> [Regression potential]
>
> The patch scope is restricted to bnx2x ptp handler, and was validated by the
> driver maintainer. If there's any possibility of regressions, we believe the
> worst would be an issue affecting the packet timestamping, not messing with the
> regular xmit path for the driver.
Applied to eoan/master-next, thanks!
More information about the kernel-team
mailing list