Optimized kernel builds: the straight dope

Tue Aug 15 03:02:09 BST 2006

Last time I tested it was on dapper release and the k7 kernel booted my
athlon more than 5 secs faster then 386. I'm the kind of user who reboots a
lot and I found those seconds pretty precious.
Just my half cent.

On 8/14/06, Matt Zimmerman <mdz at ubuntu.com> wrote:
>
> == Background ==
>
> You've all seen our kernel lineup before.  In the beginning, on i386 we
> had
> linux-386, linux-686, linux-k7, linux-686-smp, linux-k7-smp.  Later, we
> added specialized server kernels, and folded SMP support into linux-686
> and
> linux-k7.
>
> Since Linux has always offered CPU-specific optimizations, it's been taken
> for granted that this offered enough of a performance benefit to make all
> of
> this maintenance effort worthwhile.  A lot has happened to the kernel
> since
> the early days, though, and for some time, it has been capable of loading
> these optimizations at runtime.  Even when you use the -386 kernel, you
> get
> the benefit of many CPU-specific optimizations automatically.  This is
> great
> news for integrators, like Ubuntu, because we want to provide everyone
> with
> the best experience out of the box, and as you know, there isn't room for
> so
> many redundant kernels on the CD (only one).  Many users spend time and
> bandwidth quotas downloading these optimized kernel in hopes of squeezing
> the most performance out of their hardware.
>
> This begged the question: do we still need these old-fashioned builds?
> Experiments have shown that users who are told that their system will run
> faster will say that they "feel" faster whether there is a measurable
> difference or not.  For fun, try it with an unsuspecting test subject:
> tell
> them that you'll "optimize" their system to make it a little bit faster,
> and
> make some do-nothing changes to it, then see if they notice a difference.
> The fact is, our observations of performance are highly subjective, which
> is
> why we need to rely on hard data.
>
> == Data ==
>
> Enter Ben Collins, our kernel team lead, who has put together a
> performance
> test to answer that question, covering both the i386 and amd64
> architectures.  The results are attached in the form of an email from him.
> Following that is a README which explains how to interpret the numerical
> figures.
>
> No benchmark says it all.  They're all biased toward specific workloads,
> and
> very few users run a homogenous workload, especially not desktop users.
> This particular benchmark attempts to measure system responsiveness, a key
> factor in overall performance (real and perceived) for desktop workloads,
> which are largely interactive.
>
> == Conclusion ==
>
> Having read over it, I think the numbers are fairly compelling.  The
> difference in performance between -386 and -686 is insigificant; the
> measurements are all within a reasonable error range, and within that
> range,
> -686 was slower as often as it was faster.
>
> My recommendation is that we phase out these additional kernel builds,
> which
> I expect will save us a great deal of developer time, buildd time, archive
> storage and bandwidth.
>
> I'm interested to hear (objective, reasoned) feedback on this data and my
> conclusions from the members of this list.
>
> --
> - mdz
>
>
>
> ---------- Forwarded message ----------
> From: Ben Collins <bcollins at ubuntu.com>
> To: mdz at ubuntu.com
> Date: Mon, 14 Aug 2006 19:58:22 -0400
> Subject: Benchmarks between generic and cpu specific kernel images
> Test was performed on an Intel Pentium 4, 2Ghz system with 256Mb of RAM.
> I consider this about average for our users. The test I ran was
> "contest", which basically builds a kernel under several different load
> types. It's meant specifically for comparing different kernels on the
> same machine. Each one of the lines below represents the average of 3
> kernel builds under that particular type of stress. The first two are
> baseline results (cache run is just no_load without clearing the
> mem/disk cache, the rest of the results have the mem/disk cache cleared
> before starting).
>
> Here's the results:
>
> no_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     73      87.7    0.0     0.0     1.00
> 2.6.17-6-686      3     73      89.0    0.0     0.0     1.00
>
> cacherun:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     65      98.5    0.0     0.0     0.89
> 2.6.17-6-686      3     66      97.0    0.0     0.0     0.90
>
> process_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     143     44.8    260.0   52.4    1.96
> 2.6.17-6-686      3     144     44.4    230.0   52.8    1.97
>
> ctar_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     100     67.0    8.7     10.0    1.37
> 2.6.17-6-686      3     97      69.1    6.3     9.3     1.33
>
> xtar_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     96      67.7    2.0     4.2     1.32
> 2.6.17-6-686      3     96      68.8    1.7     4.1     1.32
>
> io_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     102     64.7    11.3    5.9     1.40
> 2.6.17-6-686      3     103     66.0    12.1    8.3     1.41
>
> io_other:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     94      71.3    20.9    12.8    1.29
> 2.6.17-6-686      3     97      70.1    21.3    16.3    1.33
>
> read_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     91      72.5    2.3     2.2     1.25
> 2.6.17-6-686      3     92      72.8    2.3     2.2     1.26
>
> list_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     87      75.9    0.0     5.7     1.19
> 2.6.17-6-686      3     87      77.0    0.0     6.9     1.19
>
> mem_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     327     20.5    74.0    0.6     4.48
> 2.6.17-6-686      3     390     17.7    80.7    0.8     5.34
>
> dbench_load:
> Kernel       [runs]     Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-6-386      3     83      78.3    52193.3 18.1    1.14
> 2.6.17-6-686      3     92      71.7    47986.7 23.9    1.26
>
>
> Using this as a guide, you can see that the difference is only
> negligible. The -686 targeted kernel does not gain or lose any
> significant performance on this test system. The mem_load was the only
> noticeable change, where the -686 kernel shows worse performance. This
> may be specific to my system. The individual results are here:
>
> 2.6.17-6-386 58 9 157 764901 883 mem_load 0 2 158 182219 5893 7100
> 2.6.17-6-386 58 10 381 780220 4597 mem_load 0 3 382 191532 8976 7600
> 2.6.17-6-386 59 10 445 780181 6654 mem_load 0 2 446 191709 8501 7500
> 2.6.17-6-686 58 11 244 771704 1849 mem_load 0 3 244 192658 7983 7600
> 2.6.17-6-686 58 12 445 788730 6357 mem_load 1 3 446 198200 9831 8100
> 2.6.17-6-686 59 12 481 793676 6401 mem_load 1 3 482 200208 11464 8500
>
> You can see in the 4th column the major changes in the results of the
> first run for each kernel. The results after the first run seem more
> likely to be correct.
>
> I also performed the same test on an SMP Xeon machine (2 x Core2Duo @
> 3ghz), using the amd64-generic and amd64-xeon kernel images. Here's the
> results for that:
>
> no_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   49      357.1   0.0     0.0     1.00
> 2.6.17-7-amd64-xeon         3   47      370.2   0.0     0.0     1.00
>
> cacherun:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   48      362.5   0.0     0.0     0.98
> 2.6.17-7-amd64-xeon         3   47      368.1   0.0     0.0     1.00
>
> process_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   60      285.0   131.0   95.1    1.22
> 2.6.17-7-amd64-xeon         3   60      285.0   130.0   96.7    1.28
>
> ctar_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   55      320.0   15.7    25.5    1.12
> 2.6.17-7-amd64-xeon         3   54      325.9   15.3    25.9    1.15
>
> xtar_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   51      343.1   1.7     5.9     1.04
> 2.6.17-7-amd64-xeon         3   53      330.2   2.3     7.5     1.13
>
> io_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   101     171.3   3.8     5.9     2.06
> 2.6.17-7-amd64-xeon         3   97      179.4   3.7     5.6     2.06
>
> io_other:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   115     150.4   3.8     5.9     2.35
> 2.6.17-7-amd64-xeon         3   103     168.0   3.7     5.5     2.19
>
> read_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   89      194.4   0.4     0.0     1.82
> 2.6.17-7-amd64-xeon         3   93      186.0   0.4     0.0     1.98
>
> list_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   49      357.1   0.0     2.0     1.00
> 2.6.17-7-amd64-xeon         3   53      328.3   0.0     0.0     1.13
>
> mem_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   54      338.9   3137.0  38.9    1.10
> 2.6.17-7-amd64-xeon         3   54      337.0   3155.3  38.9    1.15
>
> dbench_load:
> Kernel                 [runs]   Time    CPU%    Loads   LCPU%   Ratio
> 2.6.17-7-amd64-generic      3   84      211.9   0.0     0.0     1.71
> 2.6.17-7-amd64-xeon         3   86      207.0   0.0     0.0     1.83
>
> The only significant different was in the io_other performance. The
> io_other test involves some random I/O operations on a large file. I
> think the difference here is real. However, given that our -server
> kernel will likely get back the small loss should be enough to warrant
> still converting to generic kernels for amd64 and i386.
>
> I can make this happen using linux-meta to handle the upgrades (next
> kernel upload is an ABI bump, so would be perfect timing along with lrm
> and linux-meta).
>
> Question is, do we want...
>
> linux-image-2.6.17-7-generic
> (or just)
> linux-image-2.6.17-7
>
> ...for both i386 and amd64? I prefer the no-flavor version, but it would
> require some slight changes in the build system.
>
> I want to change the amd64 kernels to be less amd64ish, and have just
> two kernels:
>
> linux-image-2.6.17-7{,-generic}
> and
> linux-image-2.6.17-7-server
>
> That way there's more consistency in the naming with i386, and i386
> would just have the extra -server-bigiron.
>
> --
> Ubuntu     - http://www.ubuntu.com/
> Debian     - http://www.debian.org/
> Linux 1394 - http://www.linux1394.org/
> SwissDisk  - http://www.swissdisk.com/
>
>
> --
> ubuntu-devel mailing list
> ubuntu-devel at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20060814/88937dc7/attachment-0001.htm