HPSS graph-difference algorithm

Robert Collins robertc at robertcollins.net
Fri Apr 6 00:54:31 BST 2007


On Thu, 2007-04-05 at 16:45 -0500, John Arbash Meinel wrote:
> Robert Collins wrote:


> Well, it certainly is an improvement. I'm curious if we can do better
> than 5s, though. Are you saying it is startup + processing time for 5s,
> or is that including the round-trips, or what?

Its 5 seconds of startup and processing time; + 2 seconds for all the
round trips.

..
> I read that in your commit messages, and the messaging protocol section
> wasn't very clear.
> 
> Specifically, I don't understand things like:
> Server Graph: {A:[], B:[A], C:[A], D:[B, C]}
> Client Graph: {A:[], B:[A], E:[B]}

The servers graph has revisions A with no parents, B with a parent A, C
with a parent A, D with parents B and C, its HEAD is thus D and log on
it will show
D
 C
B
A

The clients graph is described likewise, and it has the branch up to B
and has committed a new revision E locally.

> Client message 1:
> get_missing
> response:
> unknown=(HEADS=[], STOPS=[])
> sample=(0, [A, D])
> client message 2:
> get_missing((HEADS=[], STOPS=[]), (0, [True, False]))
> response:
> unknown=(HEADS=[B, C], STOPS=[A])
> sample=(0, [B, C])
> client message 3:
> get_missing((HEADS=[B,C], STOPS=[A]), (0, [True, False]))
> response:
> 'complete', missing=(HEADS=[], STOPS=[B, A])
> 
> Mostly I don't understand the (0, [True, False])

> What is the '0'? I'm guessing it is saying:
> you returned HEADS=[B, C], STOPS=[A], and requested a sample of (0, [B, C]).
> Of which I have B but not C.

0 is the first element of the sample tuple; its a selection memento,
which when passed back to the server will allow the server to recreate
the same sampled revisions. Its intended to be opaque to the server, and
to allow things like - the server can choose to use a HEAD biased
selection on the first round trip and a liner biased one on the second,
or whatever.

> So I'm returning the HEADS and STOPS that you sent (so you can compute
> the 'sample' again, and know what I do and don't have.

Right, you return the heads and stops and the sample memento, then the
graph can recalculate things. I realise now that the server description
needs an overhaul, I'll do that Tuesday (after Easter that is).

> 
> ...
> 
> > I'm interested in peer review of this; so far Martin and Andrew have had
> > me bend their ear with detailed blow-by-blow accounts; and their good
> > questions and suggestions have influenced it a lot. I dont think having
> > multiple folk coding on this particular feature will help deliver it
> > much faster; folk that want to help should help Andrew with the merge of
> > the bulk of the hpss branch; which this work will depend on.
> > 
> > My goal for 0.16 with this is to make pull and push with with this to
> > the extent that determining what revision ids to pull is much faster;
> > for 0.17 we'll want a dedicated bundle format for this sort of scenario.
> > 
> > -Rob
> > 
> 
> I'm certainly trying to follow along. I've been a bit confused about
> what HEADS, STOPS, and sample are.

A better term for heads and stops are 'HEADS of the unknown subgraph'
and 'cut points of the unknown-or-missing subgraph'.

> I've tried going through your test cases to understand GraphDelta, but I
> think one side had
> 
> f - e - c - b - a
>       \   /
>         d
> 
> (f being the first commit, and a being the newest)
> 
> And the other graph had
> 
> f - e - c
> 
> And you gave heads == c, and stops == f
> 
> I would have thought that 'e' would be an important node. (Moreso than 'f').

That would be bogus. I setup two graphs as per your summary above:
>>> new
Graph ancestors={'c': ['e'], 'e': ['f'], 'f': []}, descendants={'c': {},
'e': {'c': 1}, 'f': {'e': 1}}, ghosts=set([])
>>> old
Graph ancestors={'a': ['b'], 'c': ['e'], 'b': ['c', 'd'], 'e': ['f'],
'd': ['e'], 'f': []}, descendants={'a': {}, 'c': {'b': 1}, 'b': {'a':
1}, 'e': {'c': 1, 'd': 1}, 'd': {'b': 1}, 'f': {'e': 1}}, ghosts=set([])
>>> new.changes_from(old)
GraphDelta heads=set(['c']), cut_from=set([])

This says 'when traversing old, ignore its heads and start from 'c', and
traverse to the end'.

> I tried to sort things out with graphviz, and graphs like:
> digraph merge_end {
>   a -> b
>   b -> {c d}
>   c -> e
>   d -> e
>   e -> f
> 
>   { edge [color="#ff0000"]
>           c -> e
>           e -> f
>   }
> 
>   c [style=filled, fillcolor="#00ff00"]
>   f [style=filled, fillcolor="#0000ff"]
> }
> 
> 
> (One of my big complaints was that I wasn't able to get graphviz to
> invert the direction. And I'm used to having children at the bottom, not
> the top. But rankdir=TB did the same thing as rankdir=BT (even though
> they seem to claim different things.
> 
> I'm not sure if you can explain it in a non-interactive session, though.
> So maybe we should just plan for a phone call or at least irc chat.

Sure.

> John
> =:->
> 
> 
> PS> I think having code to make graph discovery fast is a great idea. I
> know we talked about it a bit in Sydney, and I think it is really good
> for this sort of thing.
> 
> The other thing I would mention, though. Is that most of the time you
> will have an entry point into the graph. (Because you are opening a
> Branch, and you want to get the revisions for that Branch that you don't
> have).

Right, which is the wanted_heads list sent as the first parameter in the
protocol (see the netsim.missing module).

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070406/2878c681/attachment.pgp 


More information about the bazaar mailing list