[MERGE] new next_with_ghosts call on graph searchers

Mon Jan 14 23:04:05 GMT 2008

On Mon, 2008-01-14 at 16:53 +1100, Andrew Bennetts wrote:
> bb:tweak
> 
> Nothing wrong that I can see, but parts of this could be clearer I think.
> 
> Robert Collins wrote:
> [...]
> > === modified file 'bzrlib/graph.py'
> > --- bzrlib/graph.py	2008-01-11 00:44:17 +0000
> > +++ bzrlib/graph.py	2008-01-14 00:54:56 +0000
> > @@ -506,39 +506,92 @@
> >      """
> >  
> >      def __init__(self, revisions, parents_provider):
> > -        self._start = set(revisions)
> > -        self._search_revisions = None
> > -        self.seen = set(revisions)
> > +        self._iterations = 0
> > +        self._next_query = set(revisions)
> > +        self.seen = set()
> >          self._parents_provider = parents_provider
> > +        self._returning = 'checked'
> 
> I don't understand the way the “_returning” variable is used.  The values can be
> 'checked' or 'query', but it's not obvious to me how those words relate to the
> “next” vs.  “next_with_ghosts” methods.  It doesn't help my confusion that
> “checked” is an adjective, but “query” is a noun.

Its a poor man's state object; I didn't feel like doing a whole state
factoring for this. Perhaps if I used 'next' and 'next_with_ghosts' ?

> 
> [...]
> >      def next(self):
> >          """Return the next ancestors of this revision.
> >  
> >          Ancestors are returned in the order they are seen in a breadth-first
> > -        traversal.  No ancestor will be returned more than once.
> > +        traversal.  No ancestor will be returned more than once. Ancestors are
> > +        returned before their parentage is queried, so ghosts and missing
> > +        revisions (including the start revisions) are included in the result.
> > +        This can save a round trip in LCA style calculation by allowing
> > +        convergence to be detected without reading the data for the revision
> > +        the convergence occurs on.
> > +
> > +        :return: A set of revision_ids.
> >          """
> > -        if self._search_revisions is None:
> > -            self._search_revisions = self._start
> > +        if self._returning != 'query':
> > +            # switch to returning the query, not the results.
> > +            self._returning = 'query'
> > +            self._iterations += 1
> > +            self.seen.update(self._next_query)
> 
> I'm still confused about the purpose of _returning here.
> 
> I think somewhere in this class you need a comment describing the state
> variables used in this implementation.  My guess is that for the first iteration
> of next you want to do something special?  If so, this seems to be a pretty
> unclear way to achieve that.

'next' returns revision ids before they are queried, next_with_ghosts
returns them after querying them.

> > +        if self._returning != 'checked':
> > +            # switch to returning the results, not the current query.
> > +            self._returning = 'checked'
> > +            self._advance()
> > +        if len(self._next_query) == 0:
> > +            raise StopIteration()
> > +        self._advance()
> > +        return self._current_present, self._current_ghosts
> > +
> > +    def _advance(self):
> > +        """Advance the search.
> > +
> > +        Updates self.seen, self._next_query, self._current_present,
> > +        self._current_ghosts.
> > +        """
> 
> And also self._iterations and self._current_parents.  Is there a reason why some
> variables were omitted from this description?

The latter was oversight, the former wasn't really relevant - its just a
tweak for debuggability. Added both.

> [...]
> > +            for rev, parents in self._current_parents.iteritems():
> > +                for parent_id in parents:
> > +                    try:
> > +                        stop_rev_references[parent_id] -= 1
> > +                    except KeyError:
> > +                        pass
> 
> Why not use itervalues here?

Changed. Is it cheaper?

> > === modified file 'bzrlib/tests/test_graph.py'
> [...]
> > +    def test_breadth_first_search_start_ghosts(self):
> > +        parent_graph = {}
> > +        parents_provider = InstrumentedParentsProvider(
> > +            _mod_graph.DictParentsProvider(parent_graph))
> > +        graph = _mod_graph.Graph(parents_provider)
> 
> It's a bit marginal, but I wouldn't mind a “make_graph” helper method on this
> TestCase to replace these three lines of duplication with a simple:
> 
>     graph = self.make_graph({})
> 
> Perhaps call it “make_instrumented_graph_from_dict” if “make_graph” is too vague
> for your tastes.
> 
> If you did this, there'd only be a single statement before we got to the meat of
> the test method.

Hmm, I think I'll pass on this for now.

> > +        parent_graph = {
> > +            'head':['present'],
> > +            'present':['stopped'],
> > +            'other':['other_2'],
> > +            'other_2':[],
> > +            }
> > +        parents_provider = InstrumentedParentsProvider(
> > +            _mod_graph.DictParentsProvider(parent_graph))
> > +        graph = _mod_graph.Graph(parents_provider)
> > +        search = graph._make_breadth_first_searcher(['head'])
> > +        self.assertEqual((set(['head']), set()), search.next_with_ghosts())
> > +        self.assertEqual((set(['present']), set()), search.next_with_ghosts())
> > +        self.assertEqual(set(['present']),
> > +            search.stop_searching_any(['present']))
> > +        self.assertEqual((set(['other']), set(['other_ghost'])),
> > +            search.start_searching(['other', 'other_ghost']))
> > +        self.assertEqual((set(['other_2']), set()), search.next_with_ghosts())
> > +        self.assertRaises(StopIteration, search.next_with_ghosts)
> > +        # next includes them
> > +        search = graph._make_breadth_first_searcher(['head'])
> > +        self.assertEqual(set(['head']), search.next())
> > +        self.assertEqual(set(['present']), search.next())
> > +        self.assertEqual(set(['present']),
> > +            search.stop_searching_any(['present']))
> > +        search.start_searching(['other', 'other_ghost'])
> > +        self.assertEqual(set(['other_2']), search.next())
> > +        self.assertRaises(StopIteration, search.next)
> 
> Why not two separate test methods here, one for next_with_ghosts and one for
> next?  This method is long and dense enough that it's hard to read, and can
> easily be broken in two to test the two methods independently.
> 
> (I'm a bit tempted to say most of the other tests you add should also be split
> in two, but I'm not as concerned by those as they aren't as long.)

I want to keep the tests for the same pattern for the two states in the
same methods, to make it clear that they are expected to behave
'identically'.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080115/e256dca7/attachment.pgp