HPSS graph-difference algorithm
Robert Collins
robertc at robertcollins.net
Tue Apr 3 11:24:41 BST 2007
One common task we have is to determine the revisions that the server
has for a graph, that the client does not.
I've prototyped an approach that I believe will do very well in high
latency situations: From my house to London, for the bzr repository, I
think it can deliver a compact description of the needed revisions in 4
round trips; sending under 64Kb of data (the revision.kndx is 1.2Mb for
me). It takes my code about 5 seconds to do the processing of the data
to achieve this.
The traffic breakdown looks like:
request sent bytes recieved bytes
0 57 16211
1 94 27745
2 11516 1191
3 221 50
Using lftp to get revisions.kndx on the hpss branch takes me:
1403989 bytes transferred in 34 seconds (40.3K/s)
And thats after connecting :(.
I'm skipping most of the details of the algorithm in this email - its
detailed in
http://codebrowse.launchpad.net/~lifeless/bzr/graphdelta/annotate/robertc%40robertcollins.net-20070403074302-fo3x80f961hh7e7f?file_id=server.txt-20070401020531-rqa7d77i22idnj78-2
(wish that URL was shorter).
Code for it is at:
http://bazaar.launchpad.net/~lifeless/bzr/graphdelta
and (for the stuff I'm twiddling still):
http://bazaar.launchpad.net/~lifeless/+junk/bzr-netsim
I'm planning on writing a smart server verb next, to take a GraphDelta
and return the revisions mentioned in it in their gzipped form; I'll put
a InterRepository method around this to let 'bzr missing' use it; and
finally add the use of the distributed negotiation logic to this.
The graph difference should be usable on any graph, I plan to make the
API work reasonably well for using it on per file graphs and the like in
the future.
I'm interested in peer review of this; so far Martin and Andrew have had
me bend their ear with detailed blow-by-blow accounts; and their good
questions and suggestions have influenced it a lot. I dont think having
multiple folk coding on this particular feature will help deliver it
much faster; folk that want to help should help Andrew with the merge of
the bulk of the hpss branch; which this work will depend on.
My goal for 0.16 with this is to make pull and push with with this to
the extent that determining what revision ids to pull is much faster;
for 0.17 we'll want a dedicated bundle format for this sort of scenario.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070403/c469f81f/attachment-0001.pgp
More information about the bazaar
mailing list