[Bug 1826811] Re: Valgrind unhandled instruction 0xD5380000 on Aarch64
Eric Desrochers
eric.desrochers at canonical.com
Thu Dec 12 12:49:17 UTC 2019
** Description changed:
- ## DRAFT ###
[Impact]
valgrind on bionic coredump and errors out as follows:
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==11950== valgrind: Unrecognised instruction at address 0x4014c90.
==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==11950== by 0x40018C3: _dl_start_final (rtld.c:414)
==11950== by 0x4001B47: _dl_start (rtld.c:523)
==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
==11950== Your program just tried to execute an instruction that Valgrind
==11950== did not recognise. There are two possible reasons for this.
==11950== 1. Your program has a bug and erroneously jumped to a non-code
==11950== location. If you are running Memcheck and you just saw a
==11950== warning about a bad jump, it's probably your program's fault.
==11950== 2. The instruction is legitimate but Valgrind doesn't handle it,
==11950== i.e. it's Valgrind's fault. If you think this is the case or
==11950== you are not sure, please let us know and we'll try to fix it.
==11950== Either way, Valgrind will now raise a SIGILL signal which will
==11950== probably kill your program.
- ==11950==
+ ==11950==
==11950== Process terminating with default action of signal 4 (SIGILL)
==11950== Illegal opcode at address 0x4014C90
==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==11950== by 0x40018C3: _dl_start_final (rtld.c:414)
==11950== by 0x4001B47: _dl_start (rtld.c:523)
==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
+ The crash occurs because Valgrind is trying to simulate the CPU
+ instructions when debugging a specific process. Valgrind tries to
+ disassemble the whole instructions running by the process and insert the
+ debugging instructions in run time. However, in this case, Valgrind
+ cannot identify the MIDR_EL1 flag which happens in the "mrs %0,
+ midr_el1" instruction. And this instruction means to read the CPU ID
+ state register to %0(id) variable. asm volatile ("mrs %0, midr_el1" :
+ "=r"(id)); so, Valrind cannot recognize what "midr_el1" is and then
+ crashes.
+
+
+ https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
+ ....
+ d) CPU Identification :
+ MIDR_EL1 is exposed to help identify the processor. On a
+ heterogeneous system, this could be racy (just like getcpu()). The
+ process could be migrated to another CPU by the time it uses the
+ register value, unless the CPU affinity is set. Hence, there is no
+ guarantee that the value reflects the processor that it is
+ currently executing on. The REVIDR is not exposed due to this
+ constraint, as REVIDR makes sense only in conjunction with the
+ MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
+ at:
+
+ /sys/devices/system/cpu/cpu$ID/regs/identification/
+ \- midr
+ \- revidr
[Test Case]
1) Write a 'Hello World' program:
----
#include <stdio.h>
void main(void) {
printf("Hello World!\n");
};
----
2) Build it:
$ cc -o hello hello.c
3) Then run valgrind on it:
$ valgrind ./hello
[Regression Potential]
+ For the regression possibility, it should be fine.
+
+ The symtpom happens when Valgrind is trying to disassemble code inside
+ glibc (sysdeps/unix/sysv/linux/aarch64/cpu-features.c):
+
+ Even if the HWCAP_CPUID is not supported, the default value is to assign
+ 0 to the midr variable. So, I think it's not an important feature to
+ support.
+
+ Additionally, the fix is found in Ubuntu already (disco and late).
+
+ For some reasons, if a regression happens, the regression will be
+ limited to ARM arch and shouldn't affect other cpu(s) architecture.
+
[Other information]
- Upstream fix:
+ Upstream fix:
https://sourceware.org/git/?p=valgrind.git;a=commit;h=fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42
* Only affecting Bionic:
# git describe --contains fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42
VALGRIND_3_14_0~96
# rmadison valgrind
- => valgrind | 1:3.13.0-2ubuntu2.1 | bionic-updates
- valgrind | 1:3.14.0-2ubuntu6 | disco
- valgrind | 1:3.15.0-1ubuntu3.1 | eoan-updates
- valgrind | 1:3.15.0-1ubuntu5 | focal
-
+ => valgrind | 1:3.13.0-2ubuntu2.1 | bionic-updates
+ valgrind | 1:3.14.0-2ubuntu6 | disco
+ valgrind | 1:3.15.0-1ubuntu3.1 | eoan-updates
+ valgrind | 1:3.15.0-1ubuntu5 | focal
[Original Description]
I'm performing Valgrind testing on an ElPotato running Ubuntu Bionic
Aarch64 image. My program is dying like in
https://bugs.kde.org/show_bug.cgi?id=381556 :
```
$ valgrind --track-origins=yes --suppressions=cryptopp.supp ./cryptest.exe v
==12969== Memcheck, a memory error detector
==12969== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12969== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==12969== Command: ./cryptest.exe v
==12969==
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==12969== valgrind: Unrecognised instruction at address 0x4014c90.
==12969== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==12969== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==12969== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==12969== by 0x40018C3: _dl_start_final (rtld.c:414)
==12969== by 0x4001B47: _dl_start (rtld.c:523)
==12969== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
...
```
Here's a similar Red Hat issue report:
https://bugzilla.redhat.com/show_bug.cgi?id=1467952 .
Please pickup the patch in the 381556 bug report.
-----
$ lsb_release -rd
Description: Ubuntu 18.04.2 LTS
Release: 18.04
$ apt-cache policy valgrind
valgrind:
Installed: 1:3.13.0-2ubuntu2.1
Candidate: 1:3.13.0-2ubuntu2.1
Version table:
*** 1:3.13.0-2ubuntu2.1 500
500 http://ports.ubuntu.com bionic-updates/main arm64 Packages
100 /var/lib/dpkg/status
1:3.13.0-2ubuntu2 500
500 http://ports.ubuntu.com bionic/main arm64 Packages
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to valgrind in Ubuntu.
https://bugs.launchpad.net/bugs/1826811
Title:
Valgrind unhandled instruction 0xD5380000 on Aarch64
Status in valgrind package in Ubuntu:
Fix Released
Status in valgrind source package in Bionic:
In Progress
Status in valgrind package in Fedora:
Fix Released
Bug description:
[Impact]
valgrind on bionic coredump and errors out as follows:
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==11950== valgrind: Unrecognised instruction at address 0x4014c90.
==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==11950== by 0x40018C3: _dl_start_final (rtld.c:414)
==11950== by 0x4001B47: _dl_start (rtld.c:523)
==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
==11950== Your program just tried to execute an instruction that Valgrind
==11950== did not recognise. There are two possible reasons for this.
==11950== 1. Your program has a bug and erroneously jumped to a non-code
==11950== location. If you are running Memcheck and you just saw a
==11950== warning about a bad jump, it's probably your program's fault.
==11950== 2. The instruction is legitimate but Valgrind doesn't handle it,
==11950== i.e. it's Valgrind's fault. If you think this is the case or
==11950== you are not sure, please let us know and we'll try to fix it.
==11950== Either way, Valgrind will now raise a SIGILL signal which will
==11950== probably kill your program.
==11950==
==11950== Process terminating with default action of signal 4 (SIGILL)
==11950== Illegal opcode at address 0x4014C90
==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==11950== by 0x40018C3: _dl_start_final (rtld.c:414)
==11950== by 0x4001B47: _dl_start (rtld.c:523)
==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
The crash occurs because Valgrind is trying to simulate the CPU
instructions when debugging a specific process. Valgrind tries to
disassemble the whole instructions running by the process and insert
the debugging instructions in run time. However, in this case,
Valgrind cannot identify the MIDR_EL1 flag which happens in the "mrs
%0, midr_el1" instruction. And this instruction means to read the CPU
ID state register to %0(id) variable. asm volatile ("mrs %0, midr_el1"
: "=r"(id)); so, Valrind cannot recognize what "midr_el1" is and then
crashes.
https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
....
d) CPU Identification :
MIDR_EL1 is exposed to help identify the processor. On a
heterogeneous system, this could be racy (just like getcpu()). The
process could be migrated to another CPU by the time it uses the
register value, unless the CPU affinity is set. Hence, there is no
guarantee that the value reflects the processor that it is
currently executing on. The REVIDR is not exposed due to this
constraint, as REVIDR makes sense only in conjunction with the
MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
at:
/sys/devices/system/cpu/cpu$ID/regs/identification/
\- midr
\- revidr
[Test Case]
1) Write a 'Hello World' program:
----
#include <stdio.h>
void main(void) {
printf("Hello World!\n");
};
----
2) Build it:
$ cc -o hello hello.c
3) Then run valgrind on it:
$ valgrind ./hello
[Regression Potential]
For the regression possibility, it should be fine.
The symtpom happens when Valgrind is trying to disassemble code inside
glibc (sysdeps/unix/sysv/linux/aarch64/cpu-features.c):
Even if the HWCAP_CPUID is not supported, the default value is to
assign 0 to the midr variable. So, I think it's not an important
feature to support.
Additionally, the fix is found in Ubuntu already (disco and late).
For some reasons, if a regression happens, the regression will be
limited to ARM arch and shouldn't affect other cpu(s) architecture.
[Other information]
Upstream fix:
https://sourceware.org/git/?p=valgrind.git;a=commit;h=fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42
* Only affecting Bionic:
# git describe --contains fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42
VALGRIND_3_14_0~96
# rmadison valgrind
=> valgrind | 1:3.13.0-2ubuntu2.1 | bionic-updates
valgrind | 1:3.14.0-2ubuntu6 | disco
valgrind | 1:3.15.0-1ubuntu3.1 | eoan-updates
valgrind | 1:3.15.0-1ubuntu5 | focal
[Original Description]
I'm performing Valgrind testing on an ElPotato running Ubuntu Bionic
Aarch64 image. My program is dying like in
https://bugs.kde.org/show_bug.cgi?id=381556 :
```
$ valgrind --track-origins=yes --suppressions=cryptopp.supp ./cryptest.exe v
==12969== Memcheck, a memory error detector
==12969== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12969== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==12969== Command: ./cryptest.exe v
==12969==
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==12969== valgrind: Unrecognised instruction at address 0x4014c90.
==12969== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==12969== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==12969== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==12969== by 0x40018C3: _dl_start_final (rtld.c:414)
==12969== by 0x4001B47: _dl_start (rtld.c:523)
==12969== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
...
```
Here's a similar Red Hat issue report:
https://bugzilla.redhat.com/show_bug.cgi?id=1467952 .
Please pickup the patch in the 381556 bug report.
-----
$ lsb_release -rd
Description: Ubuntu 18.04.2 LTS
Release: 18.04
$ apt-cache policy valgrind
valgrind:
Installed: 1:3.13.0-2ubuntu2.1
Candidate: 1:3.13.0-2ubuntu2.1
Version table:
*** 1:3.13.0-2ubuntu2.1 500
500 http://ports.ubuntu.com bionic-updates/main arm64 Packages
100 /var/lib/dpkg/status
1:3.13.0-2ubuntu2 500
500 http://ports.ubuntu.com bionic/main arm64 Packages
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/+subscriptions
More information about the foundations-bugs
mailing list