[MERGE] new next_with_ghosts call on graph searchers
Robert Collins
robertc at robertcollins.net
Mon Jan 14 23:04:05 GMT 2008
On Mon, 2008-01-14 at 16:53 +1100, Andrew Bennetts wrote:
> bb:tweak
>
> Nothing wrong that I can see, but parts of this could be clearer I think.
>
> Robert Collins wrote:
> [...]
> > === modified file 'bzrlib/graph.py'
> > --- bzrlib/graph.py 2008-01-11 00:44:17 +0000
> > +++ bzrlib/graph.py 2008-01-14 00:54:56 +0000
> > @@ -506,39 +506,92 @@
> > """
> >
> > def __init__(self, revisions, parents_provider):
> > - self._start = set(revisions)
> > - self._search_revisions = None
> > - self.seen = set(revisions)
> > + self._iterations = 0
> > + self._next_query = set(revisions)
> > + self.seen = set()
> > self._parents_provider = parents_provider
> > + self._returning = 'checked'
>
> I don't understand the way the “_returning” variable is used. The values can be
> 'checked' or 'query', but it's not obvious to me how those words relate to the
> “next” vs. “next_with_ghosts” methods. It doesn't help my confusion that
> “checked” is an adjective, but “query” is a noun.
Its a poor man's state object; I didn't feel like doing a whole state
factoring for this. Perhaps if I used 'next' and 'next_with_ghosts' ?
>
> [...]
> > def next(self):
> > """Return the next ancestors of this revision.
> >
> > Ancestors are returned in the order they are seen in a breadth-first
> > - traversal. No ancestor will be returned more than once.
> > + traversal. No ancestor will be returned more than once. Ancestors are
> > + returned before their parentage is queried, so ghosts and missing
> > + revisions (including the start revisions) are included in the result.
> > + This can save a round trip in LCA style calculation by allowing
> > + convergence to be detected without reading the data for the revision
> > + the convergence occurs on.
> > +
> > + :return: A set of revision_ids.
> > """
> > - if self._search_revisions is None:
> > - self._search_revisions = self._start
> > + if self._returning != 'query':
> > + # switch to returning the query, not the results.
> > + self._returning = 'query'
> > + self._iterations += 1
> > + self.seen.update(self._next_query)
>
> I'm still confused about the purpose of _returning here.
>
> I think somewhere in this class you need a comment describing the state
> variables used in this implementation. My guess is that for the first iteration
> of next you want to do something special? If so, this seems to be a pretty
> unclear way to achieve that.
'next' returns revision ids before they are queried, next_with_ghosts
returns them after querying them.
> > + if self._returning != 'checked':
> > + # switch to returning the results, not the current query.
> > + self._returning = 'checked'
> > + self._advance()
> > + if len(self._next_query) == 0:
> > + raise StopIteration()
> > + self._advance()
> > + return self._current_present, self._current_ghosts
> > +
> > + def _advance(self):
> > + """Advance the search.
> > +
> > + Updates self.seen, self._next_query, self._current_present,
> > + self._current_ghosts.
> > + """
>
> And also self._iterations and self._current_parents. Is there a reason why some
> variables were omitted from this description?
The latter was oversight, the former wasn't really relevant - its just a
tweak for debuggability. Added both.
> [...]
> > + for rev, parents in self._current_parents.iteritems():
> > + for parent_id in parents:
> > + try:
> > + stop_rev_references[parent_id] -= 1
> > + except KeyError:
> > + pass
>
> Why not use itervalues here?
Changed. Is it cheaper?
> > === modified file 'bzrlib/tests/test_graph.py'
> [...]
> > + def test_breadth_first_search_start_ghosts(self):
> > + parent_graph = {}
> > + parents_provider = InstrumentedParentsProvider(
> > + _mod_graph.DictParentsProvider(parent_graph))
> > + graph = _mod_graph.Graph(parents_provider)
>
> It's a bit marginal, but I wouldn't mind a “make_graph” helper method on this
> TestCase to replace these three lines of duplication with a simple:
>
> graph = self.make_graph({})
>
> Perhaps call it “make_instrumented_graph_from_dict” if “make_graph” is too vague
> for your tastes.
>
> If you did this, there'd only be a single statement before we got to the meat of
> the test method.
Hmm, I think I'll pass on this for now.
> > + parent_graph = {
> > + 'head':['present'],
> > + 'present':['stopped'],
> > + 'other':['other_2'],
> > + 'other_2':[],
> > + }
> > + parents_provider = InstrumentedParentsProvider(
> > + _mod_graph.DictParentsProvider(parent_graph))
> > + graph = _mod_graph.Graph(parents_provider)
> > + search = graph._make_breadth_first_searcher(['head'])
> > + self.assertEqual((set(['head']), set()), search.next_with_ghosts())
> > + self.assertEqual((set(['present']), set()), search.next_with_ghosts())
> > + self.assertEqual(set(['present']),
> > + search.stop_searching_any(['present']))
> > + self.assertEqual((set(['other']), set(['other_ghost'])),
> > + search.start_searching(['other', 'other_ghost']))
> > + self.assertEqual((set(['other_2']), set()), search.next_with_ghosts())
> > + self.assertRaises(StopIteration, search.next_with_ghosts)
> > + # next includes them
> > + search = graph._make_breadth_first_searcher(['head'])
> > + self.assertEqual(set(['head']), search.next())
> > + self.assertEqual(set(['present']), search.next())
> > + self.assertEqual(set(['present']),
> > + search.stop_searching_any(['present']))
> > + search.start_searching(['other', 'other_ghost'])
> > + self.assertEqual(set(['other_2']), search.next())
> > + self.assertRaises(StopIteration, search.next)
>
> Why not two separate test methods here, one for next_with_ghosts and one for
> next? This method is long and dense enough that it's hard to read, and can
> easily be broken in two to test the two methods independently.
>
> (I'm a bit tempted to say most of the other tests you add should also be split
> in two, but I'm not as concerned by those as they aren't as long.)
I want to keep the tests for the same pattern for the two states in the
same methods, to make it clear that they are expected to behave
'identically'.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080115/e256dca7/attachment.pgp
More information about the bazaar
mailing list