[MERGE, 165304] Allow insert_data_stream to insert differently annotated stream.

Fri Nov 30 04:48:34 GMT 2007

John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
+    def get_ancestry(self, versions, topo_sorted):
+        """Get an ancestry list for versions."""
+        if topo_sorted:
+            # Not needed for basic joins
+            raise NotImplementedError(self.get_ancestry)
+        # get a graph of all the mentioned versions:
+        # Little ugly - basically copied from KnitIndex, but don't want 
to
+        # accidentally incorporate too much of that index's code.
+        graph = {}
+        pending = set(versions)
+        cache = self._by_version
+        while pending:
+            version = pending.pop()
+            # trim ghosts
+            try:
+                parents = [p for p in cache[version][2] if p in cache]
+            except KeyError:
+                raise RevisionNotPresent(version, self)
+            # if not completed and not a ghost
+            pending.update([p for p in parents if p not in graph])
+            graph[version] = parents
+        return graph.keys()

^- you are building up a dictionary just to get a set. I think it would 
be better do just do:
ancestry = set()
while pending:
   version = pending.pop()
   ...
   pending.update([p for p in parents if not in ancestry])
   ancestry.add(version)
return list(ancestry)

The only reason to build up a dict is if you wanted to do 
topo_sorted=True.
...

+            # Strictly speaking this should checkin in the backing 
knit, but
+            # until we have a test to discriminate, this will do.

"check in the backing knit"

However, _StreamIndex doesn't have access to the backing knit...

...

You are testing "add_lines_with_ghosts"
+    def test_newline_empty_lines(self):
+        # ensure that ["\n"] round trips ok.
+        knit = self.make_test_knit()
+        knit.add_lines('a', [], ["\n"])
+        knit.add_lines_with_ghosts('b', [], ["\n"])
+        self.assertEqual(["\n"], knit.get_lines('a'))
+        self.assertEqual(["\n"], knit.get_lines('b'))
+        self.assertEqual(['fulltext'], knit._index.get_options('a'))
+        self.assertEqual(['fulltext'], knit._index.get_options('b'))
+        knit.add_lines('c', ['a'], ["\n"])
+        knit.add_lines_with_ghosts('d', ['b'], ["\n"])
+        self.assertEqual(["\n"], knit.get_lines('c'))
+        self.assertEqual(["\n"], knit.get_lines('d'))
+        self.assertEqual(['line-delta'], knit._index.get_options('c'))
+        self.assertEqual(['line-delta'], knit._index.get_options('d'))

but none of those parents are actually ghosts...

...
+    def test_get_ancestry(self):
+        knit = self.make_knit_with_4_versions_2_dags()
+        self.assertIndexAncestry(knit, ['a'], ['a'], ['a'])
+        self.assertIndexAncestry(knit, ['b'], ['b'], ['b'])
+        self.assertIndexAncestry(knit, ['c'], ['c'], ['c'])
+        self.assertIndexAncestry(knit, ['c'], ['a', 'b', 'c'],
+            {'a':[], 'b':[], 'c':[]}.keys())
+        self.assertIndexAncestry(knit, ['c', 'd'], ['a', 'b', 'c', 
'd'],
+            {'a':[], 'b':[], 'c':[], 'd':[]}.keys())

^- It doesn't really seem right to use a dict to get a list of keys. I 
know we are somewhat concerned with the sort order, but why not just 
compare the set of the return value.

...

you have:
+    def assertGetPosition(self, knit, versions, version, result):
+        index = self.get_index(knit, knit.get_data_stream(versions))
+        self.assertEqual(result, index.get_position(version))
+
+    def assertGetPosition(self, knit, versions, version, result):
+        index = self.get_index(knit, knit.get_data_stream(versions))
+        if result[0] is None:
+            result = (index, result[1], result[2])
+        self.assertEqual(result, index.get_position(version))
+

That is the function defined 2 times, right?

...
+    def test_get_parents_with_ghosts(self):
+        knit = self.make_knit_with_4_versions_2_dags()
+        self.assertGetParentsWithGhosts(knit, ['a'], 'a', [])
+        self.assertGetParentsWithGhosts(knit, ['c'], 'c', ['b', 'a'])
+

^- this doesn't check anything with a node that actually has ghosts 
('d', for example)

+        try:
+            start, end = self._by_version[version_id][1]
+            return self, start, end
+        except KeyError:
+            # Signal to the access object to handle this from the 
backing knit.
+            return ('thunk:%s' % version_id, None, None)

^- this assumes that we will never have a user named "thunk:" While I 
think that is likely, it will break.
You really need a 'here:%s' % version_id
versus a 'nothere:%s' % version_id

Otherwise you are assuming your version_ids will never have "thunk:" to 
start with.

On the flip side, you could do:
'%s:thunk:' % version_id

because we *have* asserted that revision-ids ending in ':' are reserved. 
(we might even check for this in a couple places.) 'null:', and 
'current:' are the obvious ones.

A small correction is that you are using the fact that the object is a 
string (not self.stream_index) as the distinguishing factor. So why use 
'thunk:' in the first place? (I'm not a huge fan of obj is obj2 as an if 
statement, but it is what you are using.)

+            if (index[:6] != 'thunk:' or
+                self.orig_factory.__class__ != KnitPlainFactory):
^- index.startswith('thunk:')
is usually the preferred way to do that (you don't have to bother 
counting the length of 'thunk:' or forget to update it if you change 
it).

+        :param memos_for_retrieval: An iterable containing the (index, 
pos,
+            length) memo for retrieving the bytes. The .knit method 
ignores
+            the index as there is always only a single file.

^- This should be updated, as you don't ignore the index. You could 
document the 'thunk:' versus obj is self._stream_index here.

...

This seems overly restrictive:
+    def test_asking_for_thunk_stream_is_not_plain_errors(self):
+        knit = self.make_test_knit(name='annotated', annotate=True)
+        knit.add_lines("A", [], ["Foo\n"])
+        index, access = self.get_index_access(knit,
+            knit.get_data_stream([]))
+        self.assertRaises(errors.KnitCorrupt,
+            list, access.get_raw_records([("thunk:A", None, None)]))

You could just strip the annotations off, couldn't you?

That said, if you don't need the functionality now, a simple comment 
that it could be done would suffice.

It seems like you should actually be testing that trying to create a 
data stream on an annotated source fails at the beginning. Not at the 
thunk layer. Oh... shouldn't the data_stream have the same annotation 
data as the  knit you are thunking into? (either both are true, or both 
are false)

I don't really see why you need to thunk for .join() Since it shouldn't 
be copying nodes it already has.

For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C1196392148.32300.468.camel%40lifeless-64%3E