ld.so read-ahead proposal
John Richard Moser
nigelenki at comcast.net
Thu Oct 5 04:39:51 UTC 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Note: This is all theory, and I am not a glibc hacker. Anyone who
wants to give it a shot, go for it. Otherwise, this is for
informational purposes only.
I was thinking about readahead and discussing some stuff on #-devel when
the thought occurred to me that reading entire libraries in (i.e.
*PRELOAD*) may not be the greatest idea in the world. A little more
thought and I came up with the below (short version):
- ld.so links files by reading the headers of ELF files for needed
libraries and then dynamic linking those libraries.
- When programs begin executing, the only thing we can determine they
definitely need is .data and .bss
- The .text segments will have hot spots and cold spots, and some spots
that are hot at initialization and cold otherwise. The only spot we
can assume is hot is the page containing _main(), and possibly the
page after it.
- Spots in .text that are continuously hot in libraries and executables
will likely be in memory if any other process is using that library.
So I came up with the following considerations:
- When ld.so first loads anything, it should immediately complete all
library dependencies (i.e. do not perform relocations yet)
- Each library is mmap()ed into memory as it becomes known it is
needed.
- Each time ld.so finishes looking in an ELF file for the needed
libraries, it should fork() a child.
- The child should immediately begin reading the headers of each
library using readahead(), right up to .data and .bss.
- The page containing _main() and the next 3 pages should always be
read in.
- Once this process is finished, ld.so should start resolving symbols
and performing relocation. Library headers are likely faulted into
memory already, so the relocation process is probably CPU bound.
Here are the sections in the GTK library:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
- --------------------------------------------------------------------------
[ 0] NULL 00000000 000000 000000 00 0 0 0
* [ 1] .hash HASH 000000b4 0000b4 009484 04 A 2 0 4
* [ 2] .dynsym DYNSYM 00009538 009538 0151c0 10 A 3 12 4
* [ 3] .dynstr STRTAB 0001e6f8 01e6f8 022f5a 00 A 0 0 1
* [ 4] .gnu.version VERSYM 00041652 041652 002a38 02 A 2 0 2
* [ 5] .gnu.version_r VERNEED 0004408c 04408c 0000a0 00 A 3 2 4
* [ 6] .rel.dyn REL 0004412c 04412c 003660 08 A 2 0 4
* [ 7] .rel.plt REL 0004778c 04778c 0038f0 08 A 2 9 4
[ 8] .init PROGBITS 0004b07c 04b07c 000017 00 AX 0 0 4
* [ 9] .plt PROGBITS 0004b094 04b094 0071f0 04 AX 0 0 4
[10] .text PROGBITS 00052290 052290 23aba4 00 AX 0 0 16
[11] .fini PROGBITS 0028ce34 28ce34 00001c 00 AX 0 0 4
[12] .rodata PROGBITS 0028ce60 28ce60 0bb3dd 00 A 0 0 32
[13] .eh_frame PROGBITS 00348240 348240 000004 00 A 0 0 4
[14] .ctors PROGBITS 00349000 349000 000008 00 WA 0 0 4
[15] .dtors PROGBITS 00349008 349008 000008 00 WA 0 0 4
[16] .jcr PROGBITS 00349010 349010 000004 00 WA 0 0 4
* [17] .data.rel.ro PROGBITS 00349020 349020 003848 00 WA 0 0 32
[18] .dynamic DYNAMIC 0034c868 34c868 000128 08 WA 3 0 4
* [19] .got PROGBITS 0034c990 34c990 000200 04 WA 0 0 4
* [20] .got.plt PROGBITS 0034cb90 34cb90 001c84 04 WA 0 0 4
[21] .data PROGBITS 0034e820 34e820 000360 00 WA 0 0 32
[22] .bss NOBITS 0034eb80 34eb80 001340 00 WA 0 0 32
[23] .gnu_debuglink PROGBITS 00000000 34eb80 000020 00 0 0 1
[24] .shstrtab STRTAB 00000000 34eba0 0000c4 00 0 0 1
The sections we intently want are starred above. They are:
1-7, 9, 17, 19-20
The rule should be that if more than N full pages are between these
sections, they should not be read, for some good value of N. For
example, N=1 would be 4096 bytes (8 sectors on hard disk), which is
probably not worth a seek (however seeks are VERY fast here).
As a good example, above we see that .init is not only short (23 bytes);
but its entire contents are on pages that contain data we want. .text
starts 0x290 (668) bytes into the last page in .plt, so we'll read the
first 3428 bytes of that but no more. Skipping ahead, we pick up with
.data.rel.ro all the way through the beginning of .data.
The result of this, I'm hoping, is that glibc would wind up actually
performing relocations with the disk-bound part happening in the
background. Here are timing results with openoffice.org using ldd -r:
real 1.44
user 0.25
sys 0.03
real 0.54
user 0.26
sys 0.03
The second run does not have to load data from disk and completes 0.9
seconds faster. I believe the difference in user/real time is
copy-on-write overhead (which is not executed by syscalls, so is not sys
time), but this is conjecture.
It requires 0.01 seconds to perform ldd without relocation, so the
benefit here would likely be in that the 0.9 second read and 0.54 second
relocation process would be in parallel, giving a theoretical minimum
execution time of 0.9 seconds.
In essence, whenever glibc catches up to the readahead() child, it
shares the workload of readahead() and the child effectively begins
running faster; so this minimum should be approached. Another important
point is that any time those parts of those files are already in page
cache, readahead() returns immediately. This optimization effectively
can't slow the dynamic linker down, in theory.
This becomes especially interesting during loading of GNOME or KDE, when
many libraries may be loaded by many processes. The initially loaded
processes will benefit most, while those following will benefit less.
The .text sections should start being loaded as the program runs, and
only the important pages will be loaded this way. The overall result
should, however, be that the desktop loads faster.
As a final thought, the readahead process run at boot time may be
optimizable to use the same headers-and-main() rules described here for
ELF files, with a second pass for .text once the first finishes. Aside
from that one could simply move more CPU-bound processes earlier in the
boot process; I discussed the possibility of racing to get / mounted and
running readahead in the background (readahead blocks and should be
backgrounded) while drivers are being loaded, but it seems readahead is
already very early so yay.
- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.
Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
We will enslave their women, eat their children and rape their
cattle!
-- Bosc, Evil alien overlord from the fifth dimension
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQIVAwUBRSSMlAs1xW0HCTEFAQLJ/g/9GT8hMjYK13tUTxuZYBovv4uEXsfV8dgv
VduUuFZBUTOLKiIpylQSY1/arowbwe4WO1qwSevv9sujaIZf8ip2pZ/jjpmrRH9E
69Y7LbNNz3iiEyunCCXrZv/GFP3pvPN1uACzm7/WbaT0lmIcCQv0573Bz0St2//E
T1zK3JOLIqAONlBcvrLlC30HHLSJASB1/W3MDPz3PNWJToPERimT6BjRvlSZrSlK
Q0JkZqe8IVrJq6HyDKBcBRGDfxkf3NLtPTokSFI7Bfkei1LcyVdQ6gTuh4wcx4nJ
ffW9HGcn7XSwTzl1EsTd0ZAV7mNUJVoJ0HyQVuRC02UIaOor01pg6Q6WdCfSfcF4
rwxwzJdJUTgn25wQcaBSOQUEcQuLl+UKYSg9h1q1/+0sSdqXMGPXSo9rEouIhR/7
IFvsauWqfmY3mkZ4r444D3aGecW8KRhbBDrlEh8C36c8tjIVEVJeqzyQ3sKutZPe
lC2smmGmE2AJ0ykGddbSnX3LXb2s62xUBWz8C1xTiYS/umUmCmg+DlJ86HMuLclg
w2oRO+zew4mIwD6T9bkO1UefMsf8F7OdaIJpOHKhN7mlIOjIL/rlcfU/l1OJZJia
HKlEax5kpVTAX8DIaqJ7XGbZtWCvz7W2iuLWMJuQcA3+YD8Vq6NZg7vPSh4b0FA5
ByuWgDB5bDI=
=vRB6
-----END PGP SIGNATURE-----
More information about the ubuntu-users
mailing list