[MERGE, 165304] Allow insert_data_stream to insert differently annotated stream.
John Arbash Meinel
john at arbash-meinel.com
Fri Nov 30 04:48:34 GMT 2007
John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
+ def get_ancestry(self, versions, topo_sorted):
+ """Get an ancestry list for versions."""
+ if topo_sorted:
+ # Not needed for basic joins
+ raise NotImplementedError(self.get_ancestry)
+ # get a graph of all the mentioned versions:
+ # Little ugly - basically copied from KnitIndex, but don't want
to
+ # accidentally incorporate too much of that index's code.
+ graph = {}
+ pending = set(versions)
+ cache = self._by_version
+ while pending:
+ version = pending.pop()
+ # trim ghosts
+ try:
+ parents = [p for p in cache[version][2] if p in cache]
+ except KeyError:
+ raise RevisionNotPresent(version, self)
+ # if not completed and not a ghost
+ pending.update([p for p in parents if p not in graph])
+ graph[version] = parents
+ return graph.keys()
^- you are building up a dictionary just to get a set. I think it would
be better do just do:
ancestry = set()
while pending:
version = pending.pop()
...
pending.update([p for p in parents if not in ancestry])
ancestry.add(version)
return list(ancestry)
The only reason to build up a dict is if you wanted to do
topo_sorted=True.
...
+ # Strictly speaking this should checkin in the backing
knit, but
+ # until we have a test to discriminate, this will do.
"check in the backing knit"
However, _StreamIndex doesn't have access to the backing knit...
...
You are testing "add_lines_with_ghosts"
+ def test_newline_empty_lines(self):
+ # ensure that ["\n"] round trips ok.
+ knit = self.make_test_knit()
+ knit.add_lines('a', [], ["\n"])
+ knit.add_lines_with_ghosts('b', [], ["\n"])
+ self.assertEqual(["\n"], knit.get_lines('a'))
+ self.assertEqual(["\n"], knit.get_lines('b'))
+ self.assertEqual(['fulltext'], knit._index.get_options('a'))
+ self.assertEqual(['fulltext'], knit._index.get_options('b'))
+ knit.add_lines('c', ['a'], ["\n"])
+ knit.add_lines_with_ghosts('d', ['b'], ["\n"])
+ self.assertEqual(["\n"], knit.get_lines('c'))
+ self.assertEqual(["\n"], knit.get_lines('d'))
+ self.assertEqual(['line-delta'], knit._index.get_options('c'))
+ self.assertEqual(['line-delta'], knit._index.get_options('d'))
but none of those parents are actually ghosts...
...
+ def test_get_ancestry(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ self.assertIndexAncestry(knit, ['a'], ['a'], ['a'])
+ self.assertIndexAncestry(knit, ['b'], ['b'], ['b'])
+ self.assertIndexAncestry(knit, ['c'], ['c'], ['c'])
+ self.assertIndexAncestry(knit, ['c'], ['a', 'b', 'c'],
+ {'a':[], 'b':[], 'c':[]}.keys())
+ self.assertIndexAncestry(knit, ['c', 'd'], ['a', 'b', 'c',
'd'],
+ {'a':[], 'b':[], 'c':[], 'd':[]}.keys())
^- It doesn't really seem right to use a dict to get a list of keys. I
know we are somewhat concerned with the sort order, but why not just
compare the set of the return value.
...
you have:
+ def assertGetPosition(self, knit, versions, version, result):
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ self.assertEqual(result, index.get_position(version))
+
+ def assertGetPosition(self, knit, versions, version, result):
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ if result[0] is None:
+ result = (index, result[1], result[2])
+ self.assertEqual(result, index.get_position(version))
+
That is the function defined 2 times, right?
...
+ def test_get_parents_with_ghosts(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ self.assertGetParentsWithGhosts(knit, ['a'], 'a', [])
+ self.assertGetParentsWithGhosts(knit, ['c'], 'c', ['b', 'a'])
+
^- this doesn't check anything with a node that actually has ghosts
('d', for example)
+ try:
+ start, end = self._by_version[version_id][1]
+ return self, start, end
+ except KeyError:
+ # Signal to the access object to handle this from the
backing knit.
+ return ('thunk:%s' % version_id, None, None)
^- this assumes that we will never have a user named "thunk:" While I
think that is likely, it will break.
You really need a 'here:%s' % version_id
versus a 'nothere:%s' % version_id
Otherwise you are assuming your version_ids will never have "thunk:" to
start with.
On the flip side, you could do:
'%s:thunk:' % version_id
because we *have* asserted that revision-ids ending in ':' are reserved.
(we might even check for this in a couple places.) 'null:', and
'current:' are the obvious ones.
A small correction is that you are using the fact that the object is a
string (not self.stream_index) as the distinguishing factor. So why use
'thunk:' in the first place? (I'm not a huge fan of obj is obj2 as an if
statement, but it is what you are using.)
+ if (index[:6] != 'thunk:' or
+ self.orig_factory.__class__ != KnitPlainFactory):
^- index.startswith('thunk:')
is usually the preferred way to do that (you don't have to bother
counting the length of 'thunk:' or forget to update it if you change
it).
+ :param memos_for_retrieval: An iterable containing the (index,
pos,
+ length) memo for retrieving the bytes. The .knit method
ignores
+ the index as there is always only a single file.
^- This should be updated, as you don't ignore the index. You could
document the 'thunk:' versus obj is self._stream_index here.
...
This seems overly restrictive:
+ def test_asking_for_thunk_stream_is_not_plain_errors(self):
+ knit = self.make_test_knit(name='annotated', annotate=True)
+ knit.add_lines("A", [], ["Foo\n"])
+ index, access = self.get_index_access(knit,
+ knit.get_data_stream([]))
+ self.assertRaises(errors.KnitCorrupt,
+ list, access.get_raw_records([("thunk:A", None, None)]))
You could just strip the annotations off, couldn't you?
That said, if you don't need the functionality now, a simple comment
that it could be done would suffice.
It seems like you should actually be testing that trying to create a
data stream on an annotated source fails at the beginning. Not at the
thunk layer. Oh... shouldn't the data_stream have the same annotation
data as the knit you are thunking into? (either both are true, or both
are false)
I don't really see why you need to thunk for .join() Since it shouldn't
be copying nodes it already has.
For details, see:
http://bundlebuggy.aaronbentley.com/request/%3C1196392148.32300.468.camel%40lifeless-64%3E
More information about the bazaar
mailing list