[MERGE] graph.heads() performance bugfix

Mon Oct 22 20:47:35 BST 2007

On Mon, 2007-10-22 at 17:18 +1000, Ian Clatworthy wrote:
> Robert Collins wrote:
> > This fixes heads() to access less of history in the case of shortcuts. 
> 
> Good catch finding this. I hadn't looked at graph.py until today so I'm
> no expert in this area to say the least. I'm 95% confident this is OK
> though.
> 
> bb:tweak
> 
> > +   * Graph ``heads()`` queries have been bugfixed to no longer access all
> > +     history unnecessarily. (Robert Collins)
> > +
> 
> I'd prefer a verb like optimized to bugfixed.

I can say 'corrected', but it was a bug and IMO we should say so.

> Again, not a problem with this patch but a comment in the except clause
> would have helped me understand this logic. Something like:

I have a separate rework of heads() to issue one index query per loop,
across all walkers, which includes such commentary, so I will elide it
for now.

> > +                already_common = ancestor in common_walker.seen
> > +                if not already_common:
> > +                    # or it may have been just reached by all the searchers:
> >                      for searcher in searchers.itervalues():
> >                          if ancestor not in searcher.seen:
> > +                            common = False
> >                              break
> >                      else:
> > -                        # if this revision was seen by all searchers, then it
> > -                        # is a descendant of all candidates, so we can stop
> > -                        # searching it, and any seen ancestors
> > -                        for searcher in searchers.itervalues():
> > -                            seen_ancestors =\
> > -                                searcher.find_seen_ancestors(ancestor)
> > -                            searcher.stop_searching_any(seen_ancestors)
> > +                        common = True
> > +                if already_common:
> > +                    # some searcher has encountered our known common nodes:
> > +                    # just stop it
> > +                    ancestor_set = set([ancestor])
> > +                    for searcher in searchers.itervalues():
> > +                        searcher.stop_searching_any(ancestor_set)
> > +                if common:
> 
> Can/should we make this elif instead of if? It doesn't look like common
> gets set to either True or False unless already_common is false.

Perhaps it should be
if ancestor in common_walker.seen:
  ... already common code ...
else:
  ... check if its newly common ...

I'll change and send a new [MERGE] in.

> > +        We answer this using heads() as heads() has the logic to perform the
> > +        smallest number of parent looksup to determine the ancestral
> > +        relationship between N revisions.
> 
> > +        return set([candidate_descendant]) == self.heads(
> > +            [candidate_ancestor, candidate_descendant])
> 
> Sweet. I love it when a heap of code is replaced by 2 lines. It's
> clearly correct as well. :-) My only complain is that the comment above
> ought to be a comment, not part of the docstring. After all, it explains
> the implementation, not the interface.
> 
> (That grumble applies more broadly inside graph.py but fixing this one
> now is good enough for me.)

I think with these routines that this is actually part of the interface.
Its rather like the STL including the space/time complexity tradeoffs
that a given container or algorithm makes as part of the contract. There
are many many ways to skin a graph problem, and we desire guarantees
about performance and data-access in the routines we use. Accessing only
X data, or doing so in Y order is in fact performance critical and not a
substitutable component of the implementation.

(Thats no, I won't make that a comment :)).

> > --- bzrlib/tests/test_graph.py	2007-09-03 22:17:20 +0000
> > +++ bzrlib/tests/test_graph.py	2007-10-22 00:44:27 +0000
> > @@ -453,3 +453,59 @@
> >                           graph.heads(['rev2a', 'rev3b']))
> >          self.assertEqual(set(['rev2c', 'rev3a']),
> >                           graph.heads(['rev2c', 'rev3a']))
> > +
> > +
> 
> As mentioned on IRC, one line not 2 here please.

Done.

> > +                if key == 'deeper':
> > +                    import pdb;pdb.set_trace()
> > +                    self.fail('key deeper was accessed')
> 
> Hmm. When the tests pass, this 'import pdb;...' will never happen but we
> don't want it there in production code I assume.

Oversight, done.

> In summary, I *think* it's ok but I'm not sure I have the depth of
> knowledge in this area to be sure I haven't missed something. I know you
> do though. :-)
> 
> If you don't mind tweaking some other comments in graph.py while you're
> merging:
> 
> 1. Graph.__init__(). param is parents_provider, not parents_func.
> 
> 2. _BreadthFirstSearcher docstring: s/the breadth-first/breadth-first/.

Done, resubmission follows.

-Thanks,
Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071023/32dddea7/attachment.pgp