[B/D][PATCH 0/1] Fix for qede driver causing 100% CPU load
Guilherme G. Piccoli
gpiccoli at canonical.com
Wed Dec 18 18:57:50 UTC 2019
* The PTP feature in qede driver is implemented in a way that if the NIC
firmware takes some time to perform the timestamping then the PTP worker
function will reschedule itself indefinitely until the value read from a
device register is meaningful. With that behavior, if an userspace tool
requests a bad configured TX/RX filter (or if NIC firmware has any other
issue in timestamping), the function qede_ptp_task() will reschedule itself
forever and cause an unbound resource consumption. This manifests as a
kworker thread consuming 100% of CPU.
* The dmesg log will show a message like this:
"qede_ptp_tx_ts:533(eno3)]Timestamping in progress"
Also, by using perf user can observe a stack like the following:
- 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
- 44.74% worker_thread
- 44.57% process_one_work
- 42.67% qede_ptp_task
- 38.86% qed_ptp_hw_read_tx_ts
- 3.03% queue_work_on
- 2.06% __queue_work
- 0.68% get_work_pool
- 0.61% radix_tree_lookup
* The patch proposed in this SRU request refactors the PTP worked in qede by
adding a time limit, after which the task doesn't reschedule itself anymore,
failing the timestamp procedure:
9adebac37e7d ("qede: Handle infinite driver spinning for Tx timestamp.")
Besides fixing the issue, it also adds an ethtool statistics for accounting
the PTP errors.
By using chrony in Bionic, the following steps will reproduce the issue:
a) Install chrony on Bionic in a system with working NIC managed by qede;
b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf
c) Restart chrony service
Check dmesg for the "[...]Timestamping in progress" message and the
overall CPU workload using a tool like "top" to observe a kthread
consuming 100% of CPU.
The patch scope is restricted to qede PTP handler, and is upstream for more
than 7 months. If there's any possibility of regressions, the worst would
be an issue affecting the packet timestamping, not messing with the regular
xmit path of the driver.
Sudarsana Reddy Kalluru (1):
qede: Handle infinite driver spinning for Tx timestamp.
drivers/net/ethernet/qlogic/qede/qede.h | 2 +
.../net/ethernet/qlogic/qede/qede_ethtool.c | 2 +
drivers/net/ethernet/qlogic/qede/qede_main.c | 4 ++
drivers/net/ethernet/qlogic/qede/qede_ptp.c | 37 +++++++++++++++----
4 files changed, 38 insertions(+), 7 deletions(-)
More information about the kernel-team