[Bug 1928508] Re: Performance regression on memcpy() calls for AMD Zen
Heitor Alves de Siqueira
1928508 at bugs.launchpad.net
Mon Jun 7 17:49:37 UTC 2021
** Patch added: "lp1928508-focal.debdiff"
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1928508/+attachment/5502960/+files/lp1928508-focal.debdiff
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1928508
Title:
Performance regression on memcpy() calls for AMD Zen
Status in glibc package in Ubuntu:
Fix Released
Status in glibc source package in Focal:
In Progress
Status in glibc source package in Groovy:
In Progress
Bug description:
[Impact]
On AMD Zen systems, memcpy() calls see a heavy performance regression in Focal and Groovy, due to the way __x86_non_temporal_threshold is calculated.
Before 'glibc-2.33~455', cache values were calculated taking into
consideration the number of hardware threads in the CPU. On AMD Ryzen
and EPYC systems, this can be counter-productive if the number of
threads is high enough for the last-level caches to "overrun" each
other and cause cache line flushes. The solution is to reduce the
allocated size for these non_temporal stores, removing the number of
threads from the equation.
[Test Plan]
Attached to this bug is a short C program that exercises memcpy() calls in buffers of variable length. This has been obtained from a similar bug report for Red Hat, and is publicly available at [0].
This test program was compiled with gcc 10.2.0, using the following flags:
$ gcc -mtune=generic -march=x86_64 -g -03 test_memcpy.c -o test_memcpy64
Tests were performed with the following criteria:
- use 32Mb buffers ("./test_memcpy64 32")
- benchmark with the hyperfine tool [1], as it calculates relevant statistics automatically
- benchmark with at least 10 runs in the same environment, to minimize variance
- measure on AMD Zen (3700X) and on Intel Xeon (E5-2683), to ensure we don't penalize one x86 vendor in favor of the other
Below is a comparison between two Focal containers, leveraging LXD to
make use of different libc versions on the same host:
$ hyperfine -n libc-2.31-0ubuntu9.2 'lxc exec focal ./test_memcpy64 32' -n libc-patched 'lxc exec focal-patched ./test_memcpy64 32'
Benchmark #1: libc-2.31-0ubuntu9.2
Time (mean ± σ): 2.723 s ± 0.013 s [User: 4.7 ms, System: 5.1 ms]
Range (min … max): 2.693 s … 2.735 s 10 runs
Benchmark #2: libc-patched
Time (mean ± σ): 1.522 s ± 0.004 s [User: 3.9 ms, System: 5.6 ms]
Range (min … max): 1.515 s … 1.528 s 10 runs
Summary
'libc-patched' ran
1.79 ± 0.01 times faster than 'libc-2.31-0ubuntu9.2'
$ head -n5 /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 113
model name : AMD Ryzen 7 3700X 8-Core Processor
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1880670
[1] https://github.com/sharkdp/hyperfine/
[Where problems could occur]
Since we're messing with the cacheinfo for x86 in general, we need to be careful not to introduce further performance regressions on memory-heavy workloads. Even though initial results might reveal improvement on AMD Ryzen and EPYC hardware, we should also validate different configurations (e.g. Intel, different buffer sizes, etc) to make sure we won't hurt performance in other non-AMD environments.
[Other Info]
This has been fixed by the following upstream commit:
- d3c57027470b (Reversing calculation of __x86_shared_non_temporal_threshold)
$ git describe --contains d3c57027470b
glibc-2.33~455
$ rmadison glibc -s focal,focal-updates,groovy,groovy-proposed,hirsute
glibc | 2.31-0ubuntu9 | focal | source
glibc | 2.31-0ubuntu9.2 | focal-updates | source
glibc | 2.32-0ubuntu3 | groovy | source
glibc | 2.32-0ubuntu3.2 | groovy-proposed | source
glibc | 2.33-0ubuntu5 | hirsute | source
Affected releases include Ubuntu Focal and Groovy. Bionic is not
affected, and releases starting with Hirsute already ship the upstream
patch to fix this regression.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1928508/+subscriptions
More information about the foundations-bugs
mailing list