[MERGE] graph.heads() performance bugfix

Mon Oct 22 08:18:39 BST 2007

Robert Collins wrote:
> This fixes heads() to access less of history in the case of shortcuts. 

Good catch finding this. I hadn't looked at graph.py until today so I'm
no expert in this area to say the least. I'm 95% confident this is OK
though.

bb:tweak

> +   * Graph ``heads()`` queries have been bugfixed to no longer access all
> +     history unnecessarily. (Robert Collins)
> +

I'd prefer a verb like optimized to bugfixed.

> +        common_walker = self._make_breadth_first_searcher([])

It's not really clear to me what passing [] to breadth_first_searcher
does. This isn't the fault of this patch - just a general comment on the
need for really clear docstrings and comments in this module.

>          while len(active_searchers) > 0:
> +            ancestors = set()
> +            # advance searches
> +            try:
> +                common_walker.next()
> +            except StopIteration:
> +                pass

A comment in the except clause about what this case represents would be
good.

>              for candidate in active_searchers.keys():
>                  try:
>                      searcher = active_searchers[candidate]
> @@ -249,27 +262,44 @@
>                      # a descendant of another candidate.
>                      continue
>                  try:
> -                    ancestors = searcher.next()
> +                    ancestors.update(searcher.next())
>                  except StopIteration:
>                      del active_searchers[candidate]
>                      continue

Again, not a problem with this patch but a comment in the except clause
would have helped me understand this logic. Something like:

  # origin reached - this searcher exhausted

> +                already_common = ancestor in common_walker.seen
> +                if not already_common:
> +                    # or it may have been just reached by all the searchers:
>                      for searcher in searchers.itervalues():
>                          if ancestor not in searcher.seen:
> +                            common = False
>                              break
>                      else:
> -                        # if this revision was seen by all searchers, then it
> -                        # is a descendant of all candidates, so we can stop
> -                        # searching it, and any seen ancestors
> -                        for searcher in searchers.itervalues():
> -                            seen_ancestors =\
> -                                searcher.find_seen_ancestors(ancestor)
> -                            searcher.stop_searching_any(seen_ancestors)
> +                        common = True
> +                if already_common:
> +                    # some searcher has encountered our known common nodes:
> +                    # just stop it
> +                    ancestor_set = set([ancestor])
> +                    for searcher in searchers.itervalues():
> +                        searcher.stop_searching_any(ancestor_set)
> +                if common:

Can/should we make this elif instead of if? It doesn't look like common
gets set to either True or False unless already_common is false.

> +            common_walker.start_searching(new_common)

Again, not the fault of this patch but no docstring for start_searching
made the algorithm harder to follow.

> +        We answer this using heads() as heads() has the logic to perform the
> +        smallest number of parent looksup to determine the ancestral
> +        relationship between N revisions.

> +        return set([candidate_descendant]) == self.heads(
> +            [candidate_ancestor, candidate_descendant])

Sweet. I love it when a heap of code is replaced by 2 lines. It's
clearly correct as well. :-) My only complain is that the comment above
ought to be a comment, not part of the docstring. After all, it explains
the implementation, not the interface.

(That grumble applies more broadly inside graph.py but fixing this one
now is good enough for me.)

> --- bzrlib/tests/test_graph.py	2007-09-03 22:17:20 +0000
> +++ bzrlib/tests/test_graph.py	2007-10-22 00:44:27 +0000
> @@ -453,3 +453,59 @@
>                           graph.heads(['rev2a', 'rev3b']))
>          self.assertEqual(set(['rev2c', 'rev3a']),
>                           graph.heads(['rev2c', 'rev3a']))
> +
> +

As mentioned on IRC, one line not 2 here please.

> +                if key == 'deeper':
> +                    import pdb;pdb.set_trace()
> +                    self.fail('key deeper was accessed')

Hmm. When the tests pass, this 'import pdb;...' will never happen but we
don't want it there in production code I assume.

In summary, I *think* it's ok but I'm not sure I have the depth of
knowledge in this area to be sure I haven't missed something. I know you
do though. :-)

If you don't mind tweaking some other comments in graph.py while you're
merging:

1. Graph.__init__(). param is parents_provider, not parents_func.

2. _BreadthFirstSearcher docstring: s/the breadth-first/breadth-first/.

Ian C.