Some interesting benchmarks, showing index overhead

Wed Sep 3 00:12:21 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So, I've been working on this bug:
https://bugs.edge.launchpad.net/bzr/+bug/262565

In the end, I think it is going to come down to needing to use "--weave" and
getting my "LCA" for paths patch merged into bzr for 1.7 (abentley *poke* :).

Anyway, in the process, I've found some interesting issues with our index
handling. I'd be really curious how btrees handle this case.

Basically, the issue comes down that there is only about 16MB of new data to
be downloaded (for inventory and for file texts) however, we end up
downloading something like 40MB of index data to figure that out.

I think what happened is a reasonable cross-section of file-ids is in that 2MB
of data. So the "bisect through .tix" code ends up having to search *a lot* to
find everything interesting.

With the latest bzr.dev, I do see it bisecting through and having downloaded
23MB, it then switches to using a simple GET request for the whole 36MB file.
(58MB total). (The last readv is 5.7MB before it switches to GET).

Before my patch landed, it does readv's until it downloads a total of: 42.6MB.
Or about 118% of the file.
So the current code does download more total data.

Anyway, I just wanted to mention that in our current formats, the index
overhead can be rather large. (When I got the 5.1 branch, the total data
downloaded was only 1.9MB, but it took all ~40MB of index searching to find
it.) (Or about 40/2 = 8x overhead looking in indexes.)

I'm curious how much BTree will help here, I'll try to save some state so
we'll be able to run some benchmarks against it.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvchVJdeBCYSNAAMRAhmVAJ9DzoH9nFi+WGI9BMilLUJz7VQ7twCg0m2S
JMQ+DPsdO+Lef5pvOPHRHFU=
=XnM9
-----END PGP SIGNATURE-----