[MERGE] introduce is_ancestor, avoid using get_ancestry

John Arbash Meinel john at arbash-meinel.com
Fri Jul 27 16:21:53 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Hi all,
> 
> We need to stop using Repository.get_ancestry, because it scales with
> the size of history.  This patch is a step along the way.
> 
> It introduces Graph.is_ancestor, and replaces calls to get_ancestry with
> it, as appropriate.
> 
> Aaron

Well, one of the other very strong reasons not to use 'get_ancestry()' the way
we did, is that it returns a *list*, and not a set() or dict().

So doing:

ancestry = repo.get_ancestry()
if x in ancestry:
 ...

Requires doing a linear search.

Though I agree that avoiding reading all of history is good.

Anyway, it would be nice to have a tiny bit of benchmarking on this. I would
also argue that we might consider deprecating get_ancestry. The only valid user
I know of is "bzr ancestry", which could potentially do it in a different way.


I'm trying to understand your 'is_ancestor' code. At a minimum, it needs a
docstring. And certainly it could use a bit of commenting to help understand
what you are trying to do.

Some questions:

1) How does the overall search work? The 'important' portion seems to be having
a breadth-first-search walker on 'candidate_descendant', and then walking
through its ancestry until you might find the 'candidate_ancestor'.
That seems pretty straightforward.

You also have a walker on the 'candidate_ancestor', and then some sort of
'common_walker'.

I'm guessing your idea is that you can stop the descendant walker from
searching through anything that you've found as an ancestor of
'candidate_ancestor'. So that if you have:

A
|\
B C

And you are looking for "B" and "C", you can tell the "B" walker to stop
looking at anything before "A".

The things that concern me are things like:

  for node in common:
    c_ancestors = walker.find_seen_ancestors(node)
    walker.stop_searching_any(c_ancestors)

Which seems like it has to do another search for every common ancestor that is
found. (Which might be moderately cheap, I'm not really sure).

2) Why do you reset the 'common' set every pass through the loop?

I'm a little concerned about the different walkers 'missing' the common entries
(one finds it just after the other one, etc). But that is probably all caught
by the 'find_seen_ancestors' check.


Anyway, the docstring is necessary, and some comments on how it is doing its
thing would help.

But the code looks okay, and the tests seem good.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGqg2RJdeBCYSNAAMRAoLyAKDUVgGxH8I/2qCAusbYFtysPO3OPACfe0N5
OHxl967QTQouOo4v3wtQSr8=
=Xh/S
-----END PGP SIGNATURE-----



More information about the bazaar mailing list