[MERGE] Updates to the "auto-buffer" logic of GraphIndex

Fri Aug 29 20:00:06 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Attached are two tweaks to how the GraphIndex code decides to switch over to
"_buffer_all" mode. (The tests are a bit noisier than they need to be, because
I factored out some common code into helper functions.)

1) If a single readv request returns the whole index, buffer_all and return.
This turns out to happen a lot in real-world repositories with mixed size
packs, and remote access. By default remote access reads 64kB pages, and when
a repository grows "organically" it tends to end up with a lot of indexes
under the 64kB page size.

For example, bzr.dev:
http://bazaar-vcs.org/bzr/.bzr/repository/indices/
has 104 index files, and 81 of those are <64kB in size (only 68 of those
should actually be referenced, it seems we have a bit of 'junk' there.)

And bzr.dev in launchpad:
http://bazaar.launchpad.net/~bzr/bzr/trunk/.bzr/repository/indices/
has 72 index files, and 56 of those are < 64kB in size.

Or mysql-5.1:
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-5.1/.bzr/repository/indices/
has 52 index files, and 48 of them are < 64kB in size.

The existing code would read the whole file, parse it into "bisect_nodes"
pointers. And then later on might decide that it actually needed the whole
file, (say because num_requested_keys > total_keys / 20), which would cause it
to then read the file again.

My patch here just says "if offset == 0 and len(data) == self._size:
buffer_all(data)".

2) If we read more than 50% of the index by random 'readv()' calls, I trigger
a buffer_all. This is mostly because of some pathological behavior that I was
seeing. Namely, while iterating the revision graph to find the full set of
nodes to request. Our existing bisect code makes a request for (start, 800)
which the Transport layer translates into a (start, 64kB) request. The data is
returned, and then we chop off the beginning and trailing bits, which don't
amount to keys that we can actually parse.

The problem is that if we then request (start-20, 800) bytes, it will turn
that into another (start-20, 64kB) of data, which greatly overlaps what we
already have read.

Also, while searching the graph, we would eventually get to a point where we
were requesting more that 1/20th of the keys in a single request (bzr gets
pretty wide after about 40 get_parent_map() calls because we have a lot of
merging.) Which would mean it would then trigger a buffer_all anyway. When I
was looking at it, I saw us read about 1.6x the total size of the file before
we ended up triggering "buffer_all" by 'accident'.

50% is just a random threshold. Robert has mentioned in the past the
possibility that 10% of the index may be sufficient. (That is the number I
remember from my postgresql tuning days, if >10% of the data is going to be
read, use a sequential scan instead of an index.)

We could be conservative and set it to 100%, so if we've readv() enough to be
equivalent to reading the whole file, go ahead and just read the whole thing.
Note also that this only triggers on the *next* iter_entries call. Arguably we
could use "if this readv would read more than 100% of the file, buffer_all",
but I think that would only save 1 round trip (out of 40).

Both of these changes don't help fully-packed repositories much, but that
doesn't really matter as that is not the "common" case. My own bzr repo has 23
pack files.

In testing the latest bzr.dev branching over bzr+ssh://localhost with 200ms
delay (400ms ping), I get:

time   calls	bzr	branch
 94.0s 102	bzr.dev packed
133.7s 278	bzr.dev unpacked

 88.5s 75	new	packed
115.7s 202	new	unpacked

"calls" is the number of "hpss call" statements in the log file.

So we save about 27 calls on a packed repository (6% total time), and 76 calls
on an unpacked one (15% total time). These numbers are latency constrained,
but not bandwidth constrained (as I'm using the loopback).

So I did run another test doing "bzr branch lp:bzr" just to see one that is
both latency *and* bandwidth constrained. My ping to "bazaar.lp.net" is:
rtt min/avg/max/mdev = 112.003/ 270.721 /384.155/84.658 ms

So I average 270ms, but can go up to 400, and down to 100.

time	calls	command
11m04s	231	bzr.dev
10m43s	172	new

So 59 fewer calls, saving about 21s total, or 3% of total time. Not a huge
difference. I should point out that I'm still mostly bandwidth constrained
with the new code. 231 calls * 400ms = 92s or 1m32s. So at best I could get
down to 9m30s.

10m43s is actually quite good for my bandwidth. Final disk size of pack and
indexes is 87MB @ 160kB/s = 556.8s = 9m16s. So with the patch we are at 113%
of bandwidth capped, and without it we are at 116%.

I had thought about trying to get these into 1.6.1 (to help with the loss of
the HPSS.stream_revisions RPC), but there isn't a reviewer, and I didn't want
to delay rc1.

Also, I would comment that probably the next best thing to do to decrease
effective latency, is to have HPSS.readv() actually stream the chunks back.
Right now it waits to buffer the entire thing into a single memory buffer, and
then returns that to the RemoteTransport.readv handler, which then yields it
back as you go. The actual HPSS request wouldn't even have to change, just the
client.

Right now we use response_handler.read_body_bytes() rather than a function
that would let us process the bytes as they come back. It would end up more
memory efficient (you don't ever have to buffer 30MB of file/inventory data),
and it would probably be slightly faster. (while waiting on bytes coming to
the socket, you can do some work locally to push that data into your pack file.)

This could probably shave as much as 30s off of the branch time. At least,
that is the "user" time spent during download, so interleaving 30s of user
time with download time would probably be a decent win. If we could truly do
that, we would be down to 107.5% bandwidth-capped time.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIuEc2JdeBCYSNAAMRAtmDAKCLRiGm8OppK8VGuJTMWgU4VeOIHgCfbXNw
q6GXEb4GZ2V2uWX1c1Pb1sU=
=P1JS
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graph_index_autobuffer.patch
Type: text/x-diff
Size: 34815 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080829/df5e9bef/attachment-0001.bin