Optimized kernel builds: the straight dope [fp extensions]
Ben Collins
bcollins at ubuntu.com
Tue Aug 15 21:34:37 BST 2006
On Tue, 2006-08-15 at 22:08 +0200, Krzysztof Lichota wrote:
> Matt Zimmerman napisał(a):
> > On Tue, Aug 15, 2006 at 05:26:03PM +0200, Pavel Rojtberg wrote:
> >> Matt Zimmerman wrote:
> >>> I'm interested to hear (objective, reasoned) feedback on this data and my
> >>> conclusions from the members of this list.
> >> it would be interesting to see a i386 vs k7 comparison. Since the
> >> architectural difference is bigger; the k7 architecture has some floating
> >> point extensions(3D Now) which usually have a big impact on performance.
> >> Therefore we should look if the kernel can handle these well enough
> >> dynamically. Perhaps it would make sense to leave one optimized kernel on
> >> i386 for newer CPUs. This could be SSE optimized then to be more generic
> >> (AthlonXP+, Pentium3+)
> >
> > I'll add to what Ben has already said that 3DNOW, SSE, MMX and friends are
> > not general-purpose extensions which make your system faster. They are in
> > fact very specialized, and specialized for operations that the kernel
> > doesn't do. The kernel very nearly doesn't use floating point at all, much
> > less floating point extensions for 3D graphics!
>
> AFAIR gcc uses SIMD instructions for faster memcpy & friends and
> probably in some other places. So it could have some effect on
> performance (unless this is switched at runtime).
> Using prefetch instructions also can speedup some routines (encryption,
> compression, etc.).
> However, I am not sure it is enabled in kernel compilation.
These are the exact types of functions that are optimized at run time in
the kernel already. These are the types of functions that were exercised
in the benchmarks that I ran.
> IMO getting rid of these optimized kernels should be done after much
> broader testing than 2 randomly chosen desktops. Think of people using
> encrypted filesystems, compressed filesystems, etc.
The only optimized crypto is aes-586, and it is available now even in
our current -386 kernel. The optimized functions in the crypto are not
dependent on the compile target, but the actual capabilities of the
running system.
On x86, we have aes-586, which is almost completely assembly. Same with
aes-x86_64 on amd64. Neither of those change, no matter what your
compilation is with gcc.
The rest of the crypto modules, are likely compiling to their limit of
optimization with -O2 and -march=i586 (which is what you'll get with the
proposed -generic kernel).
If anyone wants to run some crypto benchmarks using these modules with
-march=i586, -march=i686 and -march=athlon, then that would be most
useful. So far most people have been guessing and assuming. We need
numbers.
--
Ubuntu - http://www.ubuntu.com/
Debian - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
SwissDisk - http://www.swissdisk.com/
More information about the ubuntu-devel
mailing list