APPLIED: [B/D][PATCH 0/1] Fix for qede driver causing 100% CPU load

Kleber Souza kleber.souza at canonical.com
Tue Jan 7 13:08:04 UTC 2020


On 2019-12-18 19:57, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1855409
> 
> 
> [Impact]
> 
> * The PTP feature in qede driver is implemented in a way that if the NIC
> firmware takes some time to perform the timestamping then the PTP worker
> function will reschedule itself indefinitely until the value read from a
> device register is meaningful. With that behavior, if an userspace tool
> requests a bad configured TX/RX filter (or if NIC firmware has any other
> issue in timestamping), the function qede_ptp_task() will reschedule itself
> forever and cause an unbound resource consumption. This manifests as a
> kworker thread consuming 100% of CPU.
> 
> * The dmesg log will show a message like this:
> "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"
> 
> Also, by using perf user can observe a stack like the following:
> - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
>      ret_from_fork
>    - kthread
>       - 44.74% worker_thread
>          - 44.57% process_one_work
>             - 42.67% qede_ptp_task
>                - 38.86% qed_ptp_hw_read_tx_ts
>                     qed_rd
>                - 3.03% queue_work_on
>                   - 2.06% __queue_work
>                      - 0.68% get_work_pool
>                         - 0.61% radix_tree_lookup
>                              __radix_tree_lookup
>               0.50% set_work_pool_and_clear_pending
> 
> * The patch proposed in this SRU request refactors the PTP worked in qede by
> adding a time limit, after which the task doesn't reschedule itself anymore,
> failing the timestamp procedure:
> 9adebac37e7d ("qede: Handle infinite driver spinning for Tx timestamp.")
> http://git.kernel.org/linus/9adebac37e7d
> 
> Besides fixing the issue, it also adds an ethtool statistics for accounting
> the PTP errors.
> 
> [Test case]
> 
> By using chrony in Bionic, the following steps will reproduce the issue:
> 
> a) Install chrony on Bionic in a system with working NIC managed by qede;
> b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf
> file;
> c) Restart chrony service
> 
> Check dmesg for the "[...]Timestamping in progress" message and the
> overall CPU workload using a tool like "top" to observe a kthread
> consuming 100% of CPU.
> 
> [Regression potential]
> 
> The patch scope is restricted to qede PTP handler, and is upstream for more
> than 7 months. If there's any possibility of regressions, the worst would
> be an issue affecting the packet timestamping, not messing with the regular
> xmit path of the driver.
> 
> 
> Sudarsana Reddy Kalluru (1):
>   qede: Handle infinite driver spinning for Tx timestamp.
> 
>  drivers/net/ethernet/qlogic/qede/qede.h       |  2 +
>  .../net/ethernet/qlogic/qede/qede_ethtool.c   |  2 +
>  drivers/net/ethernet/qlogic/qede/qede_main.c  |  4 ++
>  drivers/net/ethernet/qlogic/qede/qede_ptp.c   | 37 +++++++++++++++----
>  4 files changed, 38 insertions(+), 7 deletions(-)
> 

Applied to bionic/linux and disco/linux.

Thanks,
Kleber



More information about the kernel-team mailing list