[MERGE] Batch get_parent_map HPSS calls made from InterRepo._walk_to_common_revisions

Andrew Bennetts andrew.bennetts at canonical.com
Mon Oct 27 06:09:04 GMT 2008

John Arbash Meinel wrote:
> > Ideally with HPSS if you sent a request for a sequence like that, say
> > [tip, tip-1, tip-2, tip-4, tip-8, ...], then you'd only need a reply
> > that says what the first “hit” was.  i.e. if the 3rd graph key specified
> > was found, then the server can stop looking for the other 47 and just
> > return immediately.  So perhaps a new RPC would be good (eventually),
> > ideally something that encodes what the branching points are (e.g. if
> > there a multiple parents at tip-3 then there will be two tip-4 revisions
> > to look for).  With that RPC, the fan forward would be a follow up call
> > to that RPC that just queried every individual revision between the last
> > non-hit and the first hit.
> > 
> > The flip side to walking back 2^50 revisions is that the client will end
> > up reading basically the entire source revision graph!  Well, if there's
> > a lot of branching in the graph and the total number of revisions in a
> > request is still capped to 50 then perhaps not, but that still seems a
> > bit extreme.
> Sure. But we don't have to do 2^50. Also, we can keep the exponent we
> are on right now. So you start by sending 5 revisions: "2^0, 2^1, 2^2,
> 2^3". If that fails, you go to "2^4, 2^5, 2^6, 2^7", etc.
> (1, 2, 4, 8; 16, 32, 64, 128; etc)
> You still end up back at 128 nodes after only 2 round trips. Also,
> arguably we could just cap it at 2^10 = 1024, and just request every
> 1,000th node from there on back.

Right.  I think that's worth investigating.  I'm happy to improve this

> >> ^- The "if revision_ids.intersection(ghosts)" looks more like something
> >> that should be outside the while loop, rather than inside. But maybe I'm
> >> wrong.
> > My patch will look for all of N, A, B and C on the first iteration, and
> > fail because C is not present in the source nor in the target.  I think
> > this is a genuine problem in my code, although the test suite doesn't
> > catch it.
> > 
> > -Andrew.
> I think you are misunderstanding 'revision_ids', though I need to look
> at your patch closely. It is meant as the *revisions supplied by the
> caller*, and if any of them are missing, it will error. It is not meant
> as "all revisions in ancestry must be present".
> I'm pretty sure your patch doesn't change this, because you *never
> mutate* the supplied value.
> So I think moving it outside the loop is fine, and I don't think the new
> code is wrong in any way.

Oh, I see.  I was forgetting that the ghosts to check was an
intersection with the original revision_ids.

It occurs to me than rather putting it outside the loop, which means
doing N + 1 remote calls, we can just ask for next_revs + the ghosts (if
any) each time around the loop, thus doing just N remote calls.  This
will also fail sooner if it is going to fail.  So, I've done that.

Thanks for the reviews, this is finally on its way to PQM.


More information about the bazaar mailing list