ld.so read-ahead proposal

Thu Oct 5 04:39:51 UTC 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Note:  This is all theory, and I am not a glibc hacker.  Anyone who
       wants to give it a shot, go for it.  Otherwise, this is for
       informational purposes only.

I was thinking about readahead and discussing some stuff on #-devel when
the thought occurred to me that reading entire libraries in (i.e.
*PRELOAD*) may not be the greatest idea in the world.  A little more
thought and I came up with the below (short version):

 - ld.so links files by reading the headers of ELF files for needed
   libraries and then dynamic linking those libraries.

 - When programs begin executing, the only thing we can determine they
   definitely need is .data and .bss

 - The .text segments will have hot spots and cold spots, and some spots
   that are hot at initialization and cold otherwise.  The only spot we
   can assume is hot is the page containing _main(), and possibly the
   page after it.

 - Spots in .text that are continuously hot in libraries and executables
   will likely be in memory if any other process is using that library.

So I came up with the following considerations:

 - When ld.so first loads anything, it should immediately complete all
   library dependencies (i.e. do not perform relocations yet)

   - Each library is mmap()ed into memory as it becomes known it is
     needed.

   - Each time ld.so finishes looking in an ELF file for the needed
     libraries, it should fork() a child.

   - The child should immediately begin reading the headers of each
     library using readahead(), right up to .data and .bss.

   - The page containing _main() and the next 3 pages should always be
     read in.

 - Once this process is finished, ld.so should start resolving symbols
   and performing relocation.  Library headers are likely faulted into
   memory already, so the relocation process is probably CPU bound.

Here are the sections in the GTK library:

  [Nr] Name            Type      Addr     Off    Size   ES Flg Lk Inf Al
- --------------------------------------------------------------------------
  [ 0]                 NULL      00000000 000000 000000 00      0   0  0
* [ 1] .hash           HASH      000000b4 0000b4 009484 04   A  2   0  4
* [ 2] .dynsym         DYNSYM    00009538 009538 0151c0 10   A  3  12  4
* [ 3] .dynstr         STRTAB    0001e6f8 01e6f8 022f5a 00   A  0   0  1
* [ 4] .gnu.version    VERSYM    00041652 041652 002a38 02   A  2   0  2
* [ 5] .gnu.version_r  VERNEED   0004408c 04408c 0000a0 00   A  3   2  4
* [ 6] .rel.dyn        REL       0004412c 04412c 003660 08   A  2   0  4
* [ 7] .rel.plt        REL       0004778c 04778c 0038f0 08   A  2   9  4
  [ 8] .init           PROGBITS  0004b07c 04b07c 000017 00  AX  0   0  4
* [ 9] .plt            PROGBITS  0004b094 04b094 0071f0 04  AX  0   0  4
  [10] .text           PROGBITS  00052290 052290 23aba4 00  AX  0   0 16
  [11] .fini           PROGBITS  0028ce34 28ce34 00001c 00  AX  0   0  4
  [12] .rodata         PROGBITS  0028ce60 28ce60 0bb3dd 00   A  0   0 32
  [13] .eh_frame       PROGBITS  00348240 348240 000004 00   A  0   0  4
  [14] .ctors          PROGBITS  00349000 349000 000008 00  WA  0   0  4
  [15] .dtors          PROGBITS  00349008 349008 000008 00  WA  0   0  4
  [16] .jcr            PROGBITS  00349010 349010 000004 00  WA  0   0  4
* [17] .data.rel.ro    PROGBITS  00349020 349020 003848 00  WA  0   0 32
  [18] .dynamic        DYNAMIC   0034c868 34c868 000128 08  WA  3   0  4
* [19] .got            PROGBITS  0034c990 34c990 000200 04  WA  0   0  4
* [20] .got.plt        PROGBITS  0034cb90 34cb90 001c84 04  WA  0   0  4
  [21] .data           PROGBITS  0034e820 34e820 000360 00  WA  0   0 32
  [22] .bss            NOBITS    0034eb80 34eb80 001340 00  WA  0   0 32
  [23] .gnu_debuglink  PROGBITS  00000000 34eb80 000020 00      0   0  1
  [24] .shstrtab       STRTAB    00000000 34eba0 0000c4 00      0   0  1

The sections we intently want are starred above.  They are:

1-7, 9, 17, 19-20

The rule should be that if more than N full pages are between these
sections, they should not be read, for some good value of N.  For
example, N=1 would be 4096 bytes (8 sectors on hard disk), which is
probably not worth a seek (however seeks are VERY fast here).

As a good example, above we see that .init is not only short (23 bytes);
but its entire contents are on pages that contain data we want. .text
starts 0x290 (668) bytes into the last page in .plt, so we'll read the
first 3428 bytes of that but no more.  Skipping ahead, we pick up with
.data.rel.ro all the way through the beginning of .data.

The result of this, I'm hoping, is that glibc would wind up actually
performing relocations with the disk-bound part happening in the
background.  Here are timing results with openoffice.org using ldd -r:

real 1.44
user 0.25
sys 0.03

real 0.54
user 0.26
sys 0.03

The second run does not have to load data from disk and completes 0.9
seconds faster.  I believe the difference in user/real time is
copy-on-write overhead (which is not executed by syscalls, so is not sys
time), but this is conjecture.

It requires 0.01 seconds to perform ldd without relocation, so the
benefit here would likely be in that the 0.9 second read and 0.54 second
relocation process would be in parallel, giving a theoretical minimum
execution time of 0.9 seconds.

In essence, whenever glibc catches up to the readahead() child, it
shares the workload of readahead() and the child effectively begins
running faster; so this minimum should be approached.  Another important
point is that any time those parts of those files are already in page
cache, readahead() returns immediately.  This optimization effectively
can't slow the dynamic linker down, in theory.

This becomes especially interesting during loading of GNOME or KDE, when
many libraries may be loaded by many processes.  The initially loaded
processes will benefit most, while those following will benefit less.
The .text sections should start being loaded as the program runs, and
only the important pages will be loaded this way.  The overall result
should, however, be that the desktop loads faster.

As a final thought, the readahead process run at boot time may be
optimizable to use the same headers-and-main() rules described here for
ELF files, with a second pass for .text once the first finishes.  Aside
from that one could simply move more CPU-bound processes earlier in the
boot process; I discussed the possibility of racing to get / mounted and
running readahead in the background (readahead blocks and should be
backgrounded) while drivers are being loaded, but it seems readahead is
already very early so yay.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

    Creative brains are a valuable, limited resource. They shouldn't be
    wasted on re-inventing the wheel when there are so many fascinating
    new problems waiting out there.
                                                 -- Eric Steven Raymond

    We will enslave their women, eat their children and rape their
    cattle!
                  -- Bosc, Evil alien overlord from the fifth dimension
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQIVAwUBRSSMlAs1xW0HCTEFAQLJ/g/9GT8hMjYK13tUTxuZYBovv4uEXsfV8dgv
VduUuFZBUTOLKiIpylQSY1/arowbwe4WO1qwSevv9sujaIZf8ip2pZ/jjpmrRH9E
69Y7LbNNz3iiEyunCCXrZv/GFP3pvPN1uACzm7/WbaT0lmIcCQv0573Bz0St2//E
T1zK3JOLIqAONlBcvrLlC30HHLSJASB1/W3MDPz3PNWJToPERimT6BjRvlSZrSlK
Q0JkZqe8IVrJq6HyDKBcBRGDfxkf3NLtPTokSFI7Bfkei1LcyVdQ6gTuh4wcx4nJ
ffW9HGcn7XSwTzl1EsTd0ZAV7mNUJVoJ0HyQVuRC02UIaOor01pg6Q6WdCfSfcF4
rwxwzJdJUTgn25wQcaBSOQUEcQuLl+UKYSg9h1q1/+0sSdqXMGPXSo9rEouIhR/7
IFvsauWqfmY3mkZ4r444D3aGecW8KRhbBDrlEh8C36c8tjIVEVJeqzyQ3sKutZPe
lC2smmGmE2AJ0ykGddbSnX3LXb2s62xUBWz8C1xTiYS/umUmCmg+DlJ86HMuLclg
w2oRO+zew4mIwD6T9bkO1UefMsf8F7OdaIJpOHKhN7mlIOjIL/rlcfU/l1OJZJia
HKlEax5kpVTAX8DIaqJ7XGbZtWCvz7W2iuLWMJuQcA3+YD8Vq6NZg7vPSh4b0FA5
ByuWgDB5bDI=
=vRB6
-----END PGP SIGNATURE-----