Optimizing tree building

Aaron Bentley aaron.bentley at utoronto.ca
Fri Jun 8 04:32:02 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> For the use case 'is the text T present' our goal is a small number of
> IO's to determine presence/absence. A simple bisect into a simple
> serialised list can deliver log(keys) performance. Fixed size records
> may help a little but don't make a significant difference from what I
> can tell.

I would have thought they'd help you predict where to start bisecting.
Given a uniform distribution of keys (e.g. sha1'ed revision ids), a
fixed record length and the size of the file, you ought to be able to
make a very good guess about where the desired record is.

> E.g.
> for the new container format you could do this by the index saying
> 'start at byte N, read and parse until you find what you want'. Then if
> there we 32KB of deltas run-together with the text T at the end, the
> index could point at the beginning of the run, and an optimistic IO read
> would get everything in one hit.

Yeah, that's the sort of think I was thinking of, but I was applying it
to the list of build-dependencies:

Index 1:
text foo -> 90-166

Index 2 (starting at file position 90)
bar ():250
qux(bar): 300
baz(bar): 340
quxxx(): 410

> e.g. create a text construction index for all the data in a revision -
> pulling all the knit offsets from across the repository into one place.
> Point at the beginning of the continuous run for any part of a delta
> chain. Name the index based on the revision name. Then for a file
> changed in any given revision we can possibly avoid reading the knit
> index completely. We could build large composed indexes like this too,
> and only upload one when we do a push, and when uploading one only
> include the revisions we pushed, not the whole repository.

Yes, that does sound like it might be interesting.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGaM2y0F+nu1YWqI0RAsYjAJ93XuCMYUj0g2097EFS9SvCr3yc+ACfWd7/
dWl7gs+xq/W6uCidTUnjZDI=
=AFvr
-----END PGP SIGNATURE-----



More information about the bazaar mailing list