[MERGE] Notes on a customized chk+gc index

Tue Mar 24 16:35:21 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John Arbash Meinel wrote:
>> I was thinking about how our current indexes scale (approx 38 bytes per
>> record after compression), and what it would take to get a better index
>> for a chk+groupcompress index.
> 
>> For Launchpad, we would get a 13MB => 3.3MB cix, which has a 8.0MB btree
>> .tix. Which is cutting about 10MB off the total size, which is almost
>> 10% of the total size of a packed Launchpad repository, and is also 10%
>> of the on-disk size of the python conversion (now down to 96MB).
> 
>> This certainly isn't something that should cause us to block on
>> releasing brisbane-core, but it is certainly high up on the "nice to
>> have" list.
> 
> Wow. What a great analysis and brain dump! I wish I was smart enough to
> have written it. :-)

I was actually pretty happy with the information I was able to glean by
just playing with the collision probability function. When you look at
it, it is rather opaque, but I found some very interesting results once
I dug deeper.

> 
> I don't know enough about our current index layer to comment too deeply
> on this, but I think we can land this document and debate an implementation
> and delivery timeline separately. BTW, at the macro level, I'm pretty torn
> about the priority of this. 10% of size is nothing to sneeze at and I
> think we're fast approaching the point where index performance is
> becoming the bottleneck on numerous operations. OTOH, I worry that
> we need to focus on speed first and foremost and we won't know what
> speed gains we'll get from this till it's largely done?

Ultimately, this will only help things that are fast, but not fast
enough. The breakdown I see is:

  840ms	get_build_details
    690ms btree._get_and_cache_nodes
      655ms btree._read_nodes	960 calls, but only 673 decompress?
				maybe we have ~300 calls with an empty
				list
         63ms decompress (673 calls)
	252ms _parse_leaf_lines (657 calls)
        195ms readv()
         44ms _parse_internal_node (22 calls)

So readv() seems to have a *lot* of overhead. Digging closer, the
results are a bit surprising:
         296 0.2027      0.0048   bzrlib.transport.local:153(get)
        +296 0.0275      0.0275   +<open>
        +296 0.1521      0.0075   +bzrlib.transport.local:105(abspath)
        +296 0.0178      0.0029   +bzrlib.transport.local:94(_abspath)

^- out of 200 ms spent in get() 150 of it is spent just converting from
a URL relpath back into an abspath on disk that we can open.

lsprof may be biased here, but it is worth taking a closer look at.

Anyway, there is still almost 300ms in parsing the lines. zlib isn't a
huge overhead at only 60ms, but it is there.

So zlib doesn't seem to be a huge overhead, but getting rid of it may
mean we can work on smaller "pages" than 4k compressed (8k
uncompressed?), which may mean that we have to parse fewer nodes, etc.
Or another format may be faster to parse, since we know that we won't
expect to find things already parsed (no locality).

We would end up adding some overhead, in that our keys are
('sha1:abacd',) while the values stored would be the binary form, so you
have to do some amount of manipulation to go between the different forms.

Another possibility would be to have "bzr pack" create 2 pack files. One
would contain everything referenced in the last 100 revs, and the other
would contain everything else.
> 
> bb: tweak
> 

> 
> Personally, I think projects like OOo and Mozilla are *far* more interesting to
> optimise for than the 1M x 1M case. After all, there's a good argument that
> projects like OOo really ought to be restructured into smaller libraries pulled
> together using nested trees.

Sure, but in the end, it would all likely be stored in the same
repository. Certainly Robert's experiment with what it would take to
store all of the source to Ubuntu in a single repository is a valid data
point as well.

If you can get something that can handle that only slightly slower than
having them all as independent repositories, then you have some nice
wins wrt managing multiple repos, etc.

> 
>> +To get the smallest index possible, we store only a 2-byte 'record indicator'
>> +inside the index, and then assume that it can be decode once we've read the
> 
> s/decode/decoded/
> 
>> +If we change the analysis
>> +
> 
> You either need to finish this sentence or remove it.
> 
> Ian C.
> 

Thanks for the review.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknJC8kACgkQJdeBCYSNAAOyIQCghXFc/RYcPqnW4BREJBn2gj9P
8tMAoLVuBuCO6TzSk+wdKPmbcBTOQFRF
=AC6u
-----END PGP SIGNATURE-----