[Bug 1951032] Re: AArch64: Backport memcpy improvements
Dave Jones
1951032 at bugs.launchpad.net
Tue Mar 8 16:19:36 UTC 2022
Some more results:
= Raspberry Pi 3B 1GB =
length | before (MiB/s) | after (MiB/s) | delta
----------|----------------|----------------|----------
32768 | 48.24 | 46.19 | -4.26%
65536 | 85.99 | 79.96 | -7.02%
131072 | 154.00 | 139.68 | -9.30%
262144 | 178.72 | 164.12 | -8.17%
524288 | 163.56 | 156.55 | -4.28%
1048576 | 246.15 | 234.32 | -4.81%
= Raspberry Pi 3A+ 512MB =
length | before (MiB/s) | after (MiB/s) | delta
----------|----------------|----------------|----------
32768 | 57.11 | 54.22 | -5.06%
65536 | 101.16 | 94.53 | -6.56%
131072 | 186.94 | 168.37 | -9.94%
262144 | 200.16 | 181.37 | -9.39%
524288 | 175.91 | 168.93 | -3.97%
1048576 | 261.19 | 250.62 | -4.04%
= Raspberry Pi Zero 2 =
length | before (MiB/s) | after (MiB/s) | delta
----------|----------------|----------------|----------
32768 | 40.58 | 38.75 | -4.51%
65536 | 72.51 | 67.57 | -6.81%
131072 | 132.02 | 121.20 | -8.20%
262144 | 165.26 | 149.13 | -9.76%
524288 | 160.46 | 153.15 | -4.55%
1048576 | 241.92 | 230.87 | -4.57%
Worth noting that the Pi 4 uses the 2711 SoC, while these (the 3B, 3A+,
and Zero 2) all use the older 2837 SoC. In other words, while the new
memcpy seems "okay" on the 2711, it's got "some" performance regression
on the 2837.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1951032
Title:
AArch64: Backport memcpy improvements
Status in glibc package in Ubuntu:
Fix Released
Status in glibc source package in Focal:
Fix Committed
Bug description:
[impact]
glibc 2.32 contained a number of improvements to the memcpy routines for server-grade AArch64 implementations (in particular, graviton2 & graviton3). They should be backported to focal, as the LTS releases are by far the most used on servers.
[test case]
Download the "bench.tar.gz" attachment from this report. It has a README
that explains what to do, but here it is for reference:
benchmark for testing arm64 memcpy improvements in SRU
This is a benchmark that was derived from the memcpy benchmarks in glibc but altered to benchmark the public 'memcpy' symbol and be linked to the
installed libc.
To use this there are 5 steps:
1. build -- just run "make test"
2. run before upgrade -- "make bench-before"
3. upgrade libc6 package -- depends on what is being tested!
4. run again -- "make bench-after"
5. compare -- "make compare"
It produces output like this:
length | before (MiB/s) | after (MiB/s) | delta
----------|----------------|----------------|----------
32768 | 233.74 | 248.03 | 6.11%
65536 | 443.72 | 468.69 | 5.63%
131072 | 853.71 | 895.08 | 4.84%
262144 | 1640.93 | 1718.91 | 4.75%
524288 | 2501.80 | 2604.83 | 4.12%
1048576 | 3896.77 | 4157.74 | 6.70%
On graviton2 systems, this should show an improvement of at least
several percent. On other arm64 systems (raspberry pis of various
vintage, thunderx2, xgene, etc etc) no significant regression should
be seen.
[regression potential]
Rebuilding glibc is always a little risky (toolchain bugs and incompatibilities between the old and new versions can be surprising). But the autopkgtests and some manual general testing can help here.
For this specific change, there is a potential risk that the new
memcpy implementation could be used on a system where it is not in
fact the fastest. We should run the test case not only on the systems
where it is expected to help, but other systems such as the RPi4 and
the launchpad build farm to ensure performance is not regressed there.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1951032/+subscriptions
More information about the foundations-bugs
mailing list