[PATCH 0/1][Lunar/linux-azure] Azure: Fix perf regression: remove rx_cqes, tx_cqes counters for MANA
Tim Gardner
tim.gardner at canonical.com
Mon Jun 5 14:48:44 UTC 2023
BugLink: https://bugs.launchpad.net/bugs/2022940
SRU Justification
[Impact]
net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=1919b39fc6eabb9a6f9a51706ff6d03865f5df29
It resolves a big perf regression.
More details:
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the
frequent and parallel code path of all queues. So, r/w into this
single shared variable by many threads on different CPUs creates a
lot caching and memory overhead, hence perf regression. And, it's
not accurate due to the high volume concurrent r/w.
For example, a workload is iperf with 128 threads, and with RPS
enabled. We saw perf regression of 25% with the previous patch
adding the counters. And this patch eliminates the regression.
Since the error path of mana_poll_rx_cq() already has warnings, so
keeping the counter and convert it to a per-queue variable is not
necessary. So, just remove this counter from this high frequency
code path.
Also, remove the tx_cqes counter for the same reason. We have
warnings & other counters for errors on that path, and don't need
to count every normal cqe processing.
[Test Plan]
MSFT tested
[Regression potential]
Counters are disappearing that may be in use by user space programs.
[Other Info]
SF: #00361807
More information about the kernel-team
mailing list