[MERGE] Batch get_parent_map HPSS calls made from InterRepo._walk_to_common_revisions

Mon Oct 27 06:09:04 GMT 2008

John Arbash Meinel wrote:
[...]
> > Ideally with HPSS if you sent a request for a sequence like that, say
> > [tip, tip-1, tip-2, tip-4, tip-8, ...], then you'd only need a reply
> > that says what the first “hit” was.  i.e. if the 3rd graph key specified
> > was found, then the server can stop looking for the other 47 and just
> > return immediately.  So perhaps a new RPC would be good (eventually),
> > ideally something that encodes what the branching points are (e.g. if
> > there a multiple parents at tip-3 then there will be two tip-4 revisions
> > to look for).  With that RPC, the fan forward would be a follow up call
> > to that RPC that just queried every individual revision between the last
> > non-hit and the first hit.
> > 
> > The flip side to walking back 2^50 revisions is that the client will end
> > up reading basically the entire source revision graph!  Well, if there's
> > a lot of branching in the graph and the total number of revisions in a
> > request is still capped to 50 then perhaps not, but that still seems a
> > bit extreme.
> 
> Sure. But we don't have to do 2^50. Also, we can keep the exponent we
> are on right now. So you start by sending 5 revisions: "2^0, 2^1, 2^2,
> 2^3". If that fails, you go to "2^4, 2^5, 2^6, 2^7", etc.
> (1, 2, 4, 8; 16, 32, 64, 128; etc)
> 
> You still end up back at 128 nodes after only 2 round trips. Also,
> arguably we could just cap it at 2^10 = 1024, and just request every
> 1,000th node from there on back.

Right.  I think that's worth investigating.  I'm happy to improve this
incrementally.

[...]
> >> ^- The "if revision_ids.intersection(ghosts)" looks more like something
> >> that should be outside the while loop, rather than inside. But maybe I'm
> >> wrong.
[...]
> > My patch will look for all of N, A, B and C on the first iteration, and
> > fail because C is not present in the source nor in the target.  I think
> > this is a genuine problem in my code, although the test suite doesn't
> > catch it.
> > 
> > -Andrew.
> 
> I think you are misunderstanding 'revision_ids', though I need to look
> at your patch closely. It is meant as the *revisions supplied by the
> caller*, and if any of them are missing, it will error. It is not meant
> as "all revisions in ancestry must be present".
> 
> I'm pretty sure your patch doesn't change this, because you *never
> mutate* the supplied value.
> 
> So I think moving it outside the loop is fine, and I don't think the new
> code is wrong in any way.

Oh, I see.  I was forgetting that the ghosts to check was an
intersection with the original revision_ids.

It occurs to me than rather putting it outside the loop, which means
doing N + 1 remote calls, we can just ask for next_revs + the ghosts (if
any) each time around the loop, thus doing just N remote calls.  This
will also fail sooner if it is going to fail.  So, I've done that.

Thanks for the reviews, this is finally on its way to PQM.

-Andrew.