APPLIED: [B/D][PATCH 0/1] Fix for qede driver causing 100% CPU load
Kleber Souza
kleber.souza at canonical.com
Tue Jan 7 13:08:04 UTC 2020
On 2019-12-18 19:57, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1855409
>
>
> [Impact]
>
> * The PTP feature in qede driver is implemented in a way that if the NIC
> firmware takes some time to perform the timestamping then the PTP worker
> function will reschedule itself indefinitely until the value read from a
> device register is meaningful. With that behavior, if an userspace tool
> requests a bad configured TX/RX filter (or if NIC firmware has any other
> issue in timestamping), the function qede_ptp_task() will reschedule itself
> forever and cause an unbound resource consumption. This manifests as a
> kworker thread consuming 100% of CPU.
>
> * The dmesg log will show a message like this:
> "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"
>
> Also, by using perf user can observe a stack like the following:
> - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
> ret_from_fork
> - kthread
> - 44.74% worker_thread
> - 44.57% process_one_work
> - 42.67% qede_ptp_task
> - 38.86% qed_ptp_hw_read_tx_ts
> qed_rd
> - 3.03% queue_work_on
> - 2.06% __queue_work
> - 0.68% get_work_pool
> - 0.61% radix_tree_lookup
> __radix_tree_lookup
> 0.50% set_work_pool_and_clear_pending
>
> * The patch proposed in this SRU request refactors the PTP worked in qede by
> adding a time limit, after which the task doesn't reschedule itself anymore,
> failing the timestamp procedure:
> 9adebac37e7d ("qede: Handle infinite driver spinning for Tx timestamp.")
> http://git.kernel.org/linus/9adebac37e7d
>
> Besides fixing the issue, it also adds an ethtool statistics for accounting
> the PTP errors.
>
> [Test case]
>
> By using chrony in Bionic, the following steps will reproduce the issue:
>
> a) Install chrony on Bionic in a system with working NIC managed by qede;
> b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf
> file;
> c) Restart chrony service
>
> Check dmesg for the "[...]Timestamping in progress" message and the
> overall CPU workload using a tool like "top" to observe a kthread
> consuming 100% of CPU.
>
> [Regression potential]
>
> The patch scope is restricted to qede PTP handler, and is upstream for more
> than 7 months. If there's any possibility of regressions, the worst would
> be an issue affecting the packet timestamping, not messing with the regular
> xmit path of the driver.
>
>
> Sudarsana Reddy Kalluru (1):
> qede: Handle infinite driver spinning for Tx timestamp.
>
> drivers/net/ethernet/qlogic/qede/qede.h | 2 +
> .../net/ethernet/qlogic/qede/qede_ethtool.c | 2 +
> drivers/net/ethernet/qlogic/qede/qede_main.c | 4 ++
> drivers/net/ethernet/qlogic/qede/qede_ptp.c | 37 +++++++++++++++----
> 4 files changed, 38 insertions(+), 7 deletions(-)
>
Applied to bionic/linux and disco/linux.
Thanks,
Kleber
More information about the kernel-team
mailing list