Getting rid of alignment faults in userspace
Dave Martin
dave.martin at linaro.org
Fri Jun 17 12:10:11 UTC 2011
Hi all,
I've recently become aware that a few packages are causing alignment
faults on ARM, and are relying on the alignment fixup emulation code in
the kernel in order to work.
Such faults are very expensive in terms of CPU cycles, and can generally
only result from wrong code (for example, C/C++ code which violates the
relevant language standards, assembler which makes invalid assumptions,
or functions called with misaligned pointers due to other bugs).
Currently, on a natty Ubuntu desktop image I observe no faults except
from firefox and mono-based apps (see below).
As part of the general effort to make open source on ARM better, I think
it would be great if we can disable the alignment fixups (or at least
enable logging) and work with upstreams to get the affected packages
fixed.
For release images we might want to be more forgiving, but for development
we have the option of being more aggressive.
The number of affected packages and bugs appears small enough for the
fixing effort to be feasible, without temporarily breaking whole
distros.
For ARM, we can achieve the goal by augmenting the default kernel command-
line options: either
alignment=3
Fix up each alingment fault, but also log the faulting address
and name of the offending process to dmesg.
alignment=5
Pass each alignment fault to the user process as SIGBUS (fatal
by default) and log the faulting address and name of the
offending process to dmesg.
Fault statistics cat also be obtained at runtime by reading
/proc/cpu/alignment.
For other architectures, there may be other arch-specific ways of
achieving something similar.
I'd be interested in people's views on this.
Cheers
---Dave
More background:
Two known instances of misbehaving userland apps are:
1) firefox-4.x (bug report pending)
A char array declared as a container for C++ objects is cast
directly to an object pointer type and deferenced, without
ensuring proper alignment.
By sheer luck, the presence of an extra member in the containing
class in firefox-3.x means that the char array has a different
alignment and so the faults don't occur.
2) gtk-sharp2 (https://bugs.launchpad.net/bugs/798315) (affecting
mono-based GUI apps such as banshee and tomboy)
char pointers are cast to 64-bit integer pointers and
deferenced, as an attempt at comparing string prefixes faster.
These apps typically generate hundreds or thousands of faults per session,
but not millions, but it's still quite a lot of noise in syslog.
I think these are likely to be representative of typical causes of
alignment faults: i.e., attempted optimisations which break the rules of
the language, and which only show in certain builds, or as side-effects
of routine maintenance.
Code like that is going to be a massive own goal for performance on ARM and
other architectures which fault unaligned accesses, since the resulting faults
are likely to cost thousands of cycles per instance.
More information about the ubuntu-devel
mailing list