[MERGE] new next_with_ghosts call on graph searchers

Mon Jan 14 05:53:45 GMT 2008

bb:tweak

Nothing wrong that I can see, but parts of this could be clearer I think.

Robert Collins wrote:
[...]
> === modified file 'bzrlib/graph.py'
> --- bzrlib/graph.py	2008-01-11 00:44:17 +0000
> +++ bzrlib/graph.py	2008-01-14 00:54:56 +0000
> @@ -506,39 +506,92 @@
>      """
>  
>      def __init__(self, revisions, parents_provider):
> -        self._start = set(revisions)
> -        self._search_revisions = None
> -        self.seen = set(revisions)
> +        self._iterations = 0
> +        self._next_query = set(revisions)
> +        self.seen = set()
>          self._parents_provider = parents_provider
> +        self._returning = 'checked'

I don't understand the way the “_returning” variable is used.  The values can be
'checked' or 'query', but it's not obvious to me how those words relate to the
“next” vs.  “next_with_ghosts” methods.  It doesn't help my confusion that
“checked” is an adjective, but “query” is a noun.

[...]
>      def next(self):
>          """Return the next ancestors of this revision.
>  
>          Ancestors are returned in the order they are seen in a breadth-first
> -        traversal.  No ancestor will be returned more than once.
> +        traversal.  No ancestor will be returned more than once. Ancestors are
> +        returned before their parentage is queried, so ghosts and missing
> +        revisions (including the start revisions) are included in the result.
> +        This can save a round trip in LCA style calculation by allowing
> +        convergence to be detected without reading the data for the revision
> +        the convergence occurs on.
> +
> +        :return: A set of revision_ids.
>          """
> -        if self._search_revisions is None:
> -            self._search_revisions = self._start
> +        if self._returning != 'query':
> +            # switch to returning the query, not the results.
> +            self._returning = 'query'
> +            self._iterations += 1
> +            self.seen.update(self._next_query)

I'm still confused about the purpose of _returning here.

I think somewhere in this class you need a comment describing the state
variables used in this implementation.  My guess is that for the first iteration
of next you want to do something special?  If so, this seems to be a pretty
unclear way to achieve that.

> +    def next_with_ghosts(self):
> +        """Return the next found ancestors, with ghosts split out.
> +        
> +        Ancestors are returned in the order they are seen in a breadth-first
> +        traversal.  No ancestor will be returned more than once. Ancestors are
> +        returned only after asking for their parents, which can
> +
> +        :return: A tuple with (present ancestors, ghost ancestors) sets.
> +        """

This docstring has an incomplete sentence.

> +        if self._returning != 'checked':
> +            # switch to returning the results, not the current query.
> +            self._returning = 'checked'
> +            self._advance()
> +        if len(self._next_query) == 0:
> +            raise StopIteration()
> +        self._advance()
> +        return self._current_present, self._current_ghosts
> +
> +    def _advance(self):
> +        """Advance the search.
> +
> +        Updates self.seen, self._next_query, self._current_present,
> +        self._current_ghosts.
> +        """

And also self._iterations and self._current_parents.  Is there a reason why some
variables were omitted from this description?

[...]
> +            for rev, parents in self._current_parents.iteritems():
> +                for parent_id in parents:
> +                    try:
> +                        stop_rev_references[parent_id] -= 1
> +                    except KeyError:
> +                        pass

Why not use itervalues here?

> === modified file 'bzrlib/tests/test_graph.py'
[...]
> +    def test_breadth_first_search_start_ghosts(self):
> +        parent_graph = {}
> +        parents_provider = InstrumentedParentsProvider(
> +            _mod_graph.DictParentsProvider(parent_graph))
> +        graph = _mod_graph.Graph(parents_provider)

It's a bit marginal, but I wouldn't mind a “make_graph” helper method on this
TestCase to replace these three lines of duplication with a simple:

    graph = self.make_graph({})

Perhaps call it “make_instrumented_graph_from_dict” if “make_graph” is too vague
for your tastes.

If you did this, there'd only be a single statement before we got to the meat of
the test method.

> +    def test_breadth_first_search_change_next_to_next_with_ghosts(self):
> +        # To make the API robust, we allow changing from next() to
> +        # next_with_ghosts() and vice verca.

“changing from” is a bit unclear.  Maybe say “we allow calling both next() and
next_with_ghosts() on the same searcher.”

[...]
> +        # with_ghosts reports the ghosts
> +        search = graph._make_breadth_first_searcher(['head'])
> +        self.assertEqual((set(['head']), set()), search.next_with_ghosts())
> +        self.assertEqual(set(['present']), search.next())
> +        self.assertEqual((set(['child']), set(['ghost'])),
> +            search.next_with_ghosts())
> +        self.assertRaises(StopIteration, search.next)

“with_ghosts reports the ghosts” isn't really an accurate comment for this part
of this test.

> +        # next includes them
> +        search = graph._make_breadth_first_searcher(['head'])
> +        self.assertEqual(set(['head']), search.next())
> +        self.assertEqual((set(['present']), set()), search.next_with_ghosts())
> +        self.assertEqual(set(['child', 'ghost']),
> +            search.next())
> +        self.assertRaises(StopIteration, search.next_with_ghosts)

Ditto for “next includes them” here.

> +    def test_breadth_first_change_search(self):
> +        # To make the API robust, we allow changing from next() to
> +        # next_with_ghosts() and vice verca.

We just had this comment.  I think you meant to say “the revisions being
searched can be changed mid-search with stop_searching_any and start_searching”.

> +        parent_graph = {
> +            'head':['present'],
> +            'present':['stopped'],
> +            'other':['other_2'],
> +            'other_2':[],
> +            }
> +        parents_provider = InstrumentedParentsProvider(
> +            _mod_graph.DictParentsProvider(parent_graph))
> +        graph = _mod_graph.Graph(parents_provider)
> +        search = graph._make_breadth_first_searcher(['head'])
> +        self.assertEqual((set(['head']), set()), search.next_with_ghosts())
> +        self.assertEqual((set(['present']), set()), search.next_with_ghosts())
> +        self.assertEqual(set(['present']),
> +            search.stop_searching_any(['present']))
> +        self.assertEqual((set(['other']), set(['other_ghost'])),
> +            search.start_searching(['other', 'other_ghost']))
> +        self.assertEqual((set(['other_2']), set()), search.next_with_ghosts())
> +        self.assertRaises(StopIteration, search.next_with_ghosts)
> +        # next includes them
> +        search = graph._make_breadth_first_searcher(['head'])
> +        self.assertEqual(set(['head']), search.next())
> +        self.assertEqual(set(['present']), search.next())
> +        self.assertEqual(set(['present']),
> +            search.stop_searching_any(['present']))
> +        search.start_searching(['other', 'other_ghost'])
> +        self.assertEqual(set(['other_2']), search.next())
> +        self.assertRaises(StopIteration, search.next)

Why not two separate test methods here, one for next_with_ghosts and one for
next?  This method is long and dense enough that it's hard to read, and can
easily be broken in two to test the two methods independently.

(I'm a bit tempted to say most of the other tests you add should also be split
in two, but I'm not as concerned by those as they aren't as long.)

-Andrew.