[MERGE] graph.heads() performance bugfix
Robert Collins
robertc at robertcollins.net
Mon Oct 22 20:47:35 BST 2007
On Mon, 2007-10-22 at 17:18 +1000, Ian Clatworthy wrote:
> Robert Collins wrote:
> > This fixes heads() to access less of history in the case of shortcuts.
>
> Good catch finding this. I hadn't looked at graph.py until today so I'm
> no expert in this area to say the least. I'm 95% confident this is OK
> though.
>
> bb:tweak
>
> > + * Graph ``heads()`` queries have been bugfixed to no longer access all
> > + history unnecessarily. (Robert Collins)
> > +
>
> I'd prefer a verb like optimized to bugfixed.
I can say 'corrected', but it was a bug and IMO we should say so.
> Again, not a problem with this patch but a comment in the except clause
> would have helped me understand this logic. Something like:
I have a separate rework of heads() to issue one index query per loop,
across all walkers, which includes such commentary, so I will elide it
for now.
> > + already_common = ancestor in common_walker.seen
> > + if not already_common:
> > + # or it may have been just reached by all the searchers:
> > for searcher in searchers.itervalues():
> > if ancestor not in searcher.seen:
> > + common = False
> > break
> > else:
> > - # if this revision was seen by all searchers, then it
> > - # is a descendant of all candidates, so we can stop
> > - # searching it, and any seen ancestors
> > - for searcher in searchers.itervalues():
> > - seen_ancestors =\
> > - searcher.find_seen_ancestors(ancestor)
> > - searcher.stop_searching_any(seen_ancestors)
> > + common = True
> > + if already_common:
> > + # some searcher has encountered our known common nodes:
> > + # just stop it
> > + ancestor_set = set([ancestor])
> > + for searcher in searchers.itervalues():
> > + searcher.stop_searching_any(ancestor_set)
> > + if common:
>
> Can/should we make this elif instead of if? It doesn't look like common
> gets set to either True or False unless already_common is false.
Perhaps it should be
if ancestor in common_walker.seen:
... already common code ...
else:
... check if its newly common ...
I'll change and send a new [MERGE] in.
> > + We answer this using heads() as heads() has the logic to perform the
> > + smallest number of parent looksup to determine the ancestral
> > + relationship between N revisions.
>
> > + return set([candidate_descendant]) == self.heads(
> > + [candidate_ancestor, candidate_descendant])
>
> Sweet. I love it when a heap of code is replaced by 2 lines. It's
> clearly correct as well. :-) My only complain is that the comment above
> ought to be a comment, not part of the docstring. After all, it explains
> the implementation, not the interface.
>
> (That grumble applies more broadly inside graph.py but fixing this one
> now is good enough for me.)
I think with these routines that this is actually part of the interface.
Its rather like the STL including the space/time complexity tradeoffs
that a given container or algorithm makes as part of the contract. There
are many many ways to skin a graph problem, and we desire guarantees
about performance and data-access in the routines we use. Accessing only
X data, or doing so in Y order is in fact performance critical and not a
substitutable component of the implementation.
(Thats no, I won't make that a comment :)).
> > --- bzrlib/tests/test_graph.py 2007-09-03 22:17:20 +0000
> > +++ bzrlib/tests/test_graph.py 2007-10-22 00:44:27 +0000
> > @@ -453,3 +453,59 @@
> > graph.heads(['rev2a', 'rev3b']))
> > self.assertEqual(set(['rev2c', 'rev3a']),
> > graph.heads(['rev2c', 'rev3a']))
> > +
> > +
>
> As mentioned on IRC, one line not 2 here please.
Done.
> > + if key == 'deeper':
> > + import pdb;pdb.set_trace()
> > + self.fail('key deeper was accessed')
>
> Hmm. When the tests pass, this 'import pdb;...' will never happen but we
> don't want it there in production code I assume.
Oversight, done.
> In summary, I *think* it's ok but I'm not sure I have the depth of
> knowledge in this area to be sure I haven't missed something. I know you
> do though. :-)
>
> If you don't mind tweaking some other comments in graph.py while you're
> merging:
>
> 1. Graph.__init__(). param is parents_provider, not parents_func.
>
> 2. _BreadthFirstSearcher docstring: s/the breadth-first/breadth-first/.
Done, resubmission follows.
-Thanks,
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071023/32dddea7/attachment.pgp
More information about the bazaar
mailing list