Rev 3059: Allow insert_data_stream to insert differently annotated stream. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Fri Nov 30 08:33:09 GMT 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 3059
revision-id:pqm at pqm.ubuntu.com-20071130083301-5zq7705t6xa7yikn
parent: pqm at pqm.ubuntu.com-20071130080302-lcnafsyhqzjq6fjb
parent: andrew.bennetts at canonical.com-20071130075646-6ego2oagvdkk4xtk
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2007-11-30 08:33:01 +0000
message:
Allow insert_data_stream to insert differently annotated stream.
(#165304, Robert Collins, Andrew Bennetts)
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/errors.py errors.py-20050309040759-20512168c4e14fbd
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/tests/test_errors.py test_errors.py-20060210110251-41aba2deddf936a8
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
------------------------------------------------------------
revno: 3052.2.6
revision-id:andrew.bennetts at canonical.com-20071130075646-6ego2oagvdkk4xtk
parent: andrew.bennetts at canonical.com-20071130074556-ux7lnmgmx1ouiyi3
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: knit.datastreamjoin
timestamp: Fri 2007-11-30 18:56:46 +1100
message:
Fix typo in comment.
modified:
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
------------------------------------------------------------
revno: 3052.2.5
revision-id:andrew.bennetts at canonical.com-20071130074556-ux7lnmgmx1ouiyi3
parent: andrew.bennetts at canonical.com-20071130071925-ptg1jgzgfcx52lpi
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: knit.datastreamjoin
timestamp: Fri 2007-11-30 18:45:56 +1100
message:
Address the rest of the review comments from John and myself.
modified:
bzrlib/errors.py errors.py-20050309040759-20512168c4e14fbd
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
------------------------------------------------------------
revno: 3052.2.4
revision-id:andrew.bennetts at canonical.com-20071130071925-ptg1jgzgfcx52lpi
parent: robertc at robertcollins.net-20071130030701-r0wm01t0a8qx29gk
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: knit.datastreamjoin
timestamp: Fri 2007-11-30 18:19:25 +1100
message:
Some tweaks suggested by John's review.
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
------------------------------------------------------------
revno: 3052.2.3
revision-id:robertc at robertcollins.net-20071130030701-r0wm01t0a8qx29gk
parent: robertc at robertcollins.net-20071130005436-0qmx32hyti0jz0y6
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knit.datastreamjoin
timestamp: Fri 2007-11-30 14:07:01 +1100
message:
Handle insert_data_stream of an unannotated stream into an annotated knit.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
------------------------------------------------------------
revno: 3052.2.2
revision-id:robertc at robertcollins.net-20071130005436-0qmx32hyti0jz0y6
parent: robertc at robertcollins.net-20071129222449-r3r2er12d2p70wmy
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knit.datastreamjoin
timestamp: Fri 2007-11-30 11:54:36 +1100
message:
* Operations pulling data from a smart server where the underlying
repositories have are respectively annotated an unannotated will now work.
(Robert Collins, #165106).
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
------------------------------------------------------------
revno: 3052.2.1
revision-id:robertc at robertcollins.net-20071129222449-r3r2er12d2p70wmy
parent: pqm at pqm.ubuntu.com-20071129184101-u9506rihe4zbzyyz
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knit.datastreamjoin
timestamp: Fri 2007-11-30 09:24:49 +1100
message:
Add a new KnitDataStreamUnknown error class for showing formats we can't understand.
modified:
bzrlib/errors.py errors.py-20050309040759-20512168c4e14fbd
bzrlib/tests/test_errors.py test_errors.py-20060210110251-41aba2deddf936a8
=== modified file 'NEWS'
--- a/NEWS 2007-11-30 08:03:02 +0000
+++ b/NEWS 2007-11-30 08:33:01 +0000
@@ -182,6 +182,10 @@
* Obsolete packs are now cleaned up by pack and autopack operations.
(Robert Collins, #153789)
+ * Operations pulling data from a smart server where the underlying
+ repositories are not both annotated/both unannotated will now work.
+ (Robert Collins, #165304).
+
* Reconcile now shows progress bars. (Robert Collins, #159351)
* ``RemoteBranch`` was not initializing ``self._revision_id_to_revno_map``
=== modified file 'bzrlib/errors.py'
--- a/bzrlib/errors.py 2007-11-29 18:41:01 +0000
+++ b/bzrlib/errors.py 2007-11-30 07:45:56 +0000
@@ -1332,6 +1332,8 @@
class KnitDataStreamIncompatible(KnitError):
+ # Not raised anymore, as we can convert data streams. In future we may
+ # need it again for more exotic cases, so we're keeping it around for now.
_fmt = "Cannot insert knit data stream of format \"%(stream_format)s\" into knit of format \"%(target_format)s\"."
@@ -1340,6 +1342,15 @@
self.target_format = target_format
+class KnitDataStreamUnknown(KnitError):
+ # Indicates a data stream we don't know how to handle.
+
+ _fmt = "Cannot parse knit data stream of format \"%(stream_format)s\"."
+
+ def __init__(self, stream_format):
+ self.stream_format = stream_format
+
+
class KnitHeaderError(KnitError):
_fmt = 'Knit header error: %(badline)r unexpected for file "%(filename)s".'
=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py 2007-11-29 00:22:51 +0000
+++ b/bzrlib/knit.py 2007-11-30 07:45:56 +0000
@@ -96,7 +96,6 @@
KnitError,
InvalidRevisionId,
KnitCorrupt,
- KnitDataStreamIncompatible,
KnitHeaderError,
RevisionNotPresent,
RevisionAlreadyPresent,
@@ -738,8 +737,10 @@
"""
if format != self.get_format_signature():
trace.mutter('incompatible format signature inserting to %r', self)
- raise KnitDataStreamIncompatible(
- format, self.get_format_signature())
+ source = self._knit_from_datastream(
+ (format, data_list, reader_callable))
+ self.join(source)
+ return
for version_id, options, length, parents in data_list:
if self.has_version(version_id):
@@ -787,6 +788,28 @@
[(version_id, options, parents, length)],
reader_callable(length))
+ def _knit_from_datastream(self, (format, data_list, reader_callable)):
+ """Create a knit object from a data stream.
+
+ This method exists to allow conversion of data streams that do not
+ match the signature of this knit. Generally it will be slower and use
+ more memory to use this method to insert data, but it will work.
+
+ :seealso: get_data_stream for details on datastreams.
+ :return: A knit versioned file which can be used to join the datastream
+ into self.
+ """
+ if format == "knit-plain":
+ factory = KnitPlainFactory()
+ elif format == "knit-annotated":
+ factory = KnitAnnotateFactory()
+ else:
+ raise errors.KnitDataStreamUnknown(format)
+ index = _StreamIndex(data_list)
+ access = _StreamAccess(reader_callable, index, self, factory)
+ return KnitVersionedFile(self.filename, self.transport,
+ factory=factory, index=index, access_method=access)
+
def versions(self):
"""See VersionedFile.versions."""
if 'evil' in debug.debug_flags:
@@ -1512,9 +1535,9 @@
return 'line-delta'
def get_options(self, version_id):
- """Return a string represention options.
+ """Return a list representing options.
- e.g. foo,bar
+ e.g. ['foo', 'bar']
"""
return self._cache[version_id][1]
@@ -1748,9 +1771,9 @@
raise RevisionNotPresent(version_id, self)
def get_options(self, version_id):
- """Return a string represention options.
+ """Return a list representing options.
- e.g. foo,bar
+ e.g. ['foo', 'bar']
"""
node = self._get_node(version_id)
if not self._deltas:
@@ -2022,6 +2045,183 @@
self.write_index = index
+class _StreamAccess(object):
+ """A Knit Access object that provides data from a datastream.
+
+ It also provides a fallback to present as unannotated data, annotated data
+ from a *backing* access object.
+
+ This is triggered by a index_memo which is pointing to a different index
+ than this was constructed with, and is used to allow extracting full
+ unannotated texts for insertion into annotated knits.
+ """
+
+ def __init__(self, reader_callable, stream_index, backing_knit,
+ orig_factory):
+ """Create a _StreamAccess object.
+
+ :param reader_callable: The reader_callable from the datastream.
+ This is called to buffer all the data immediately, for
+ random access.
+ :param stream_index: The index the data stream this provides access to
+ which will be present in native index_memo's.
+ :param backing_knit: The knit object that will provide access to
+ annotated texts which are not available in the stream, so as to
+ create unannotated texts.
+ :param orig_factory: The original content factory used to generate the
+ stream. This is used for checking whether the thunk code for
+ supporting _copy_texts will generate the correct form of data.
+ """
+ self.data = reader_callable(None)
+ self.stream_index = stream_index
+ self.backing_knit = backing_knit
+ self.orig_factory = orig_factory
+
+ def get_raw_records(self, memos_for_retrieval):
+ """Get the raw bytes for a records.
+
+ :param memos_for_retrieval: An iterable containing the (thunk_flag,
+ index, start, end) memo for retrieving the bytes.
+ :return: An iterator over the bytes of the records.
+ """
+ # use a generator for memory friendliness
+ for thunk_flag, version_id, start, end in memos_for_retrieval:
+ if version_id is self.stream_index:
+ yield self.data[start:end]
+ continue
+ # we have been asked to thunk. This thunking only occurs when
+ # we are obtaining plain texts from an annotated backing knit
+ # so that _copy_texts will work.
+ # We could improve performance here by scanning for where we need
+ # to do this and using get_line_list, then interleaving the output
+ # as desired. However, for now, this is sufficient.
+ if self.orig_factory.__class__ != KnitPlainFactory:
+ raise errors.KnitCorrupt(
+ self, 'Bad thunk request %r' % version_id)
+ lines = self.backing_knit.get_lines(version_id)
+ line_bytes = ''.join(lines)
+ digest = sha_string(line_bytes)
+ if lines:
+ if lines[-1][-1] != '\n':
+ lines[-1] = lines[-1] + '\n'
+ line_bytes += '\n'
+ orig_options = list(self.backing_knit._index.get_options(version_id))
+ if 'fulltext' not in orig_options:
+ if 'line-delta' not in orig_options:
+ raise errors.KnitCorrupt(self,
+ 'Unknown compression method %r' % orig_options)
+ orig_options.remove('line-delta')
+ orig_options.append('fulltext')
+ # We want plain data, because we expect to thunk only to allow text
+ # extraction.
+ size, bytes = self.backing_knit._data._record_to_data(version_id,
+ digest, lines, line_bytes)
+ yield bytes
+
+
+class _StreamIndex(object):
+ """A Knit Index object that uses the data map from a datastream."""
+
+ def __init__(self, data_list):
+ """Create a _StreamIndex object.
+
+ :param data_list: The data_list from the datastream.
+ """
+ self.data_list = data_list
+ self._by_version = {}
+ pos = 0
+ for key, options, length, parents in data_list:
+ self._by_version[key] = options, (pos, pos + length), parents
+ pos += length
+
+ def get_ancestry(self, versions, topo_sorted):
+ """Get an ancestry list for versions."""
+ if topo_sorted:
+ # Not needed for basic joins
+ raise NotImplementedError(self.get_ancestry)
+ # get a graph of all the mentioned versions:
+ # Little ugly - basically copied from KnitIndex, but don't want to
+ # accidentally incorporate too much of that index's code.
+ ancestry = set()
+ pending = set(versions)
+ cache = self._by_version
+ while pending:
+ version = pending.pop()
+ # trim ghosts
+ try:
+ parents = [p for p in cache[version][2] if p in cache]
+ except KeyError:
+ raise RevisionNotPresent(version, self)
+ # if not completed and not a ghost
+ pending.update([p for p in parents if p not in ancestry])
+ ancestry.add(version)
+ return list(ancestry)
+
+ def get_method(self, version_id):
+ """Return compression method of specified version."""
+ try:
+ options = self._by_version[version_id][0]
+ except KeyError:
+ # Strictly speaking this should check in the backing knit, but
+ # until we have a test to discriminate, this will do.
+ return 'fulltext'
+ if 'fulltext' in options:
+ return 'fulltext'
+ elif 'line-delta' in options:
+ return 'line-delta'
+ else:
+ raise errors.KnitIndexUnknownMethod(self, options)
+
+ def get_options(self, version_id):
+ """Return a list representing options.
+
+ e.g. ['foo', 'bar']
+ """
+ return self._by_version[version_id][0]
+
+ def get_parents_with_ghosts(self, version_id):
+ """Return parents of specified version with ghosts."""
+ return self._by_version[version_id][2]
+
+ def get_position(self, version_id):
+ """Return details needed to access the version.
+
+ _StreamAccess has the data as a big array, so we return slice
+ coordinates into that (as index_memo's are opaque outside the
+ index and matching access class).
+
+ :return: a tuple (thunk_flag, index, start, end). If thunk_flag is
+ False, index will be self, otherwise it will be a version id.
+ """
+ try:
+ start, end = self._by_version[version_id][1]
+ return False, self, start, end
+ except KeyError:
+ # Signal to the access object to handle this from the backing knit.
+ return (True, version_id, None, None)
+
+ def get_versions(self):
+ """Get all the versions in the stream."""
+ return self._by_version.keys()
+
+ def iter_parents(self, version_ids):
+ """Iterate through the parents for many version ids.
+
+ :param version_ids: An iterable yielding version_ids.
+ :return: An iterator that yields (version_id, parents). Requested
+ version_ids not present in the versioned file are simply skipped.
+ The order is undefined, allowing for different optimisations in
+ the underlying implementation.
+ """
+ result = []
+ for version in version_ids:
+ try:
+ result.append((version, self._by_version[version][2]))
+ except KeyError:
+ pass
+ return result
+
+
class _KnitData(object):
"""Manage extraction of data from a KnitAccess, caching and decompressing.
@@ -2282,7 +2482,7 @@
for index, version in enumerate(to_process):
pb.update('Converting versioned data', index, total)
sha1, num_bytes, parent_text = self.target.add_lines(version,
- self.source.get_parents(version),
+ self.source.get_parents_with_ghosts(version),
self.source.get_lines(version),
parent_texts=parent_cache)
parent_cache[version] = parent_text
@@ -2316,7 +2516,8 @@
if None in version_ids:
version_ids.remove(None)
- self.source_ancestry = set(self.source.get_ancestry(version_ids))
+ self.source_ancestry = set(self.source.get_ancestry(version_ids,
+ topo_sorted=False))
this_versions = set(self.target._index.get_versions())
# XXX: For efficiency we should not look at the whole index,
# we only need to consider the referenced revisions - they
=== modified file 'bzrlib/tests/test_errors.py'
--- a/bzrlib/tests/test_errors.py 2007-11-23 08:31:24 +0000
+++ b/bzrlib/tests/test_errors.py 2007-11-29 22:24:49 +0000
@@ -85,6 +85,12 @@
'"stream format" into knit of format '
'"target format".', str(error))
+ def test_knit_data_stream_unknown(self):
+ error = errors.KnitDataStreamUnknown(
+ 'stream format')
+ self.assertEqual('Cannot parse knit data stream of format '
+ '"stream format".', str(error))
+
def test_knit_header_error(self):
error = errors.KnitHeaderError('line foo\n', 'path/to/file')
self.assertEqual("Knit header error: 'line foo\\n' unexpected"
=== modified file 'bzrlib/tests/test_knit.py'
--- a/bzrlib/tests/test_knit.py 2007-11-26 22:33:25 +0000
+++ b/bzrlib/tests/test_knit.py 2007-11-30 07:56:46 +0000
@@ -47,6 +47,8 @@
_KnitIndex,
_PackAccess,
PlainKnitContent,
+ _StreamAccess,
+ _StreamIndex,
WeaveToKnit,
KnitSequenceMatcher,
)
@@ -59,6 +61,7 @@
)
from bzrlib.transport import get_transport
from bzrlib.transport.memory import MemoryTransport
+from bzrlib.tuned_gzip import GzipFile
from bzrlib.util import bencode
from bzrlib.weave import Weave
@@ -1132,6 +1135,38 @@
self.assertTrue(k.has_version('text-1'))
self.assertEqualDiff(''.join(k.get_lines('text-1')), TEXT_1)
+ def test_newline_empty_lines(self):
+ # ensure that ["\n"] round trips ok.
+ knit = self.make_test_knit()
+ knit.add_lines('a', [], ["\n"])
+ knit.add_lines_with_ghosts('b', [], ["\n"])
+ self.assertEqual(["\n"], knit.get_lines('a'))
+ self.assertEqual(["\n"], knit.get_lines('b'))
+ self.assertEqual(['fulltext'], knit._index.get_options('a'))
+ self.assertEqual(['fulltext'], knit._index.get_options('b'))
+ knit.add_lines('c', ['a'], ["\n"])
+ knit.add_lines_with_ghosts('d', ['b'], ["\n"])
+ self.assertEqual(["\n"], knit.get_lines('c'))
+ self.assertEqual(["\n"], knit.get_lines('d'))
+ self.assertEqual(['line-delta'], knit._index.get_options('c'))
+ self.assertEqual(['line-delta'], knit._index.get_options('d'))
+
+ def test_empty_lines(self):
+ # bizarrely, [] is not listed as having no-eol.
+ knit = self.make_test_knit()
+ knit.add_lines('a', [], [])
+ knit.add_lines_with_ghosts('b', [], [])
+ self.assertEqual([], knit.get_lines('a'))
+ self.assertEqual([], knit.get_lines('b'))
+ self.assertEqual(['fulltext'], knit._index.get_options('a'))
+ self.assertEqual(['fulltext'], knit._index.get_options('b'))
+ knit.add_lines('c', ['a'], [])
+ knit.add_lines_with_ghosts('d', ['b'], [])
+ self.assertEqual([], knit.get_lines('c'))
+ self.assertEqual([], knit.get_lines('d'))
+ self.assertEqual(['line-delta'], knit._index.get_options('c'))
+ self.assertEqual(['line-delta'], knit._index.get_options('d'))
+
def test_knit_reload(self):
# test that the content in a reloaded knit is correct
k = self.make_test_knit()
@@ -1748,6 +1783,19 @@
knit1.transport.get_bytes(knit1._index._filename),
knit2.transport.get_bytes(knit2._index._filename))
+ def assertKnitValuesEqual(self, left, right):
+ """Assert that the texts, annotations and graph of left and right are
+ the same.
+ """
+ self.assertEqual(set(left.versions()), set(right.versions()))
+ for version in left.versions():
+ self.assertEqual(left.get_parents_with_ghosts(version),
+ right.get_parents_with_ghosts(version))
+ self.assertEqual(left.get_lines(version),
+ right.get_lines(version))
+ self.assertEqual(left.annotate(version),
+ right.annotate(version))
+
def test_insert_data_stream_empty(self):
"""Inserting a data stream with no records should not put any data into
the knit.
@@ -1768,12 +1816,36 @@
source = self.make_test_knit(name='source')
source.add_lines('text-a', [], split_lines(TEXT_1))
data_stream = source.get_data_stream(['text-a'])
-
target = self.make_test_knit(name='target')
target.insert_data_stream(data_stream)
-
self.assertKnitFilesEqual(source, target)
+ def test_insert_data_stream_annotated_unannotated(self):
+ """Inserting an annotated datastream to an unannotated knit works."""
+ # case one - full texts.
+ source = self.make_test_knit(name='source', annotate=True)
+ target = self.make_test_knit(name='target', annotate=False)
+ source.add_lines('text-a', [], split_lines(TEXT_1))
+ target.insert_data_stream(source.get_data_stream(['text-a']))
+ self.assertKnitValuesEqual(source, target)
+ # case two - deltas.
+ source.add_lines('text-b', ['text-a'], split_lines(TEXT_2))
+ target.insert_data_stream(source.get_data_stream(['text-b']))
+ self.assertKnitValuesEqual(source, target)
+
+ def test_insert_data_stream_unannotated_annotated(self):
+ """Inserting an unannotated datastream to an annotated knit works."""
+ # case one - full texts.
+ source = self.make_test_knit(name='source', annotate=False)
+ target = self.make_test_knit(name='target', annotate=True)
+ source.add_lines('text-a', [], split_lines(TEXT_1))
+ target.insert_data_stream(source.get_data_stream(['text-a']))
+ self.assertKnitValuesEqual(source, target)
+ # case two - deltas.
+ source.add_lines('text-b', ['text-a'], split_lines(TEXT_2))
+ target.insert_data_stream(source.get_data_stream(['text-b']))
+ self.assertKnitValuesEqual(source, target)
+
def test_insert_data_stream_records_already_present(self):
"""Insert a data stream where some records are alreday present in the
target, and some not. Only the new records are inserted.
@@ -1855,21 +1927,60 @@
self.assertRaises(
errors.KnitCorrupt, target.insert_data_stream, data_stream)
- def test_insert_data_stream_incompatible_format(self):
+ def test_insert_data_stream_unknown_format(self):
"""A data stream in a different format to the target knit cannot be
inserted.
- It will raise KnitDataStreamIncompatible.
+ It will raise KnitDataStreamUnknown because the fallback code will fail
+ to make a knit. In future we may need KnitDataStreamIncompatible again,
+ for more exotic cases.
"""
data_stream = ('fake-format-signature', [], lambda _: '')
target = self.make_test_knit(name='target')
self.assertRaises(
- errors.KnitDataStreamIncompatible,
+ errors.KnitDataStreamUnknown,
target.insert_data_stream, data_stream)
# * test that a stream of "already present version, then new version"
# inserts correctly.
+
+ def assertMadeStreamKnit(self, source_knit, versions, target_knit):
+ """Assert that a knit made from a stream is as expected."""
+ a_stream = source_knit.get_data_stream(versions)
+ expected_data = a_stream[2](None)
+ a_stream = source_knit.get_data_stream(versions)
+ a_knit = target_knit._knit_from_datastream(a_stream)
+ self.assertEqual(source_knit.factory.__class__,
+ a_knit.factory.__class__)
+ self.assertIsInstance(a_knit._data._access, _StreamAccess)
+ self.assertIsInstance(a_knit._index, _StreamIndex)
+ self.assertEqual(a_knit._index.data_list, a_stream[1])
+ self.assertEqual(a_knit._data._access.data, expected_data)
+ self.assertEqual(a_knit.filename, target_knit.filename)
+ self.assertEqual(a_knit.transport, target_knit.transport)
+ self.assertEqual(a_knit._index, a_knit._data._access.stream_index)
+ self.assertEqual(target_knit, a_knit._data._access.backing_knit)
+ self.assertIsInstance(a_knit._data._access.orig_factory,
+ source_knit.factory.__class__)
+
+ def test__knit_from_data_stream_empty(self):
+ """Create a knit object from a datastream."""
+ annotated = self.make_test_knit(name='source', annotate=True)
+ plain = self.make_test_knit(name='target', annotate=False)
+ # case 1: annotated source
+ self.assertMadeStreamKnit(annotated, [], annotated)
+ self.assertMadeStreamKnit(annotated, [], plain)
+ # case 2: plain source
+ self.assertMadeStreamKnit(plain, [], annotated)
+ self.assertMadeStreamKnit(plain, [], plain)
+
+ def test__knit_from_data_stream_unknown_format(self):
+ annotated = self.make_test_knit(name='source', annotate=True)
+ self.assertRaises(errors.KnitDataStreamUnknown,
+ annotated._knit_from_datastream, ("unknown", None, None))
+
+
TEXT_1 = """\
Banana cup cakes:
@@ -2719,3 +2830,204 @@
# will fail and we'll adjust it to handle that case correctly, rather
# than allowing an over-read that is bogus.
self.assertEqual(expected_length, len(stream[2](-1)))
+
+
+class Test_StreamIndex(KnitTests):
+
+ def get_index(self, knit, stream):
+ """Get a _StreamIndex from knit and stream."""
+ return knit._knit_from_datastream(stream)._index
+
+ def assertIndexVersions(self, knit, versions):
+ """Check that the _StreamIndex versions are those of the stream."""
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ self.assertEqual(set(index.get_versions()), set(versions))
+ # check we didn't get duplicates
+ self.assertEqual(len(index.get_versions()), len(versions))
+
+ def assertIndexAncestry(self, knit, ancestry_versions, versions, result):
+ """Check the result of a get_ancestry call on knit."""
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ self.assertEqual(
+ set(result),
+ set(index.get_ancestry(ancestry_versions, False)))
+
+ def assertIterParents(self, knit, versions, parent_versions, result):
+ """Check the result of an iter_parents call on knit."""
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ self.assertEqual(result, index.iter_parents(parent_versions))
+
+ def assertGetMethod(self, knit, versions, version, result):
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ self.assertEqual(result, index.get_method(version))
+
+ def assertGetOptions(self, knit, version, options):
+ index = self.get_index(knit, knit.get_data_stream(version))
+ self.assertEqual(options, index.get_options(version))
+
+ def assertGetPosition(self, knit, versions, version, result):
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ if result[1] is None:
+ result = (result[0], index, result[2], result[3])
+ self.assertEqual(result, index.get_position(version))
+
+ def assertGetParentsWithGhosts(self, knit, versions, version, parents):
+ index = self.get_index(knit, knit.get_data_stream(versions))
+ self.assertEqual(parents, index.get_parents_with_ghosts(version))
+
+ def make_knit_with_4_versions_2_dags(self):
+ knit = self.make_test_knit()
+ knit.add_lines('a', [], ["foo"])
+ knit.add_lines('b', [], [])
+ knit.add_lines('c', ['b', 'a'], [])
+ knit.add_lines_with_ghosts('d', ['e', 'f'], [])
+ return knit
+
+ def test_versions(self):
+ """The versions of a StreamIndex are those of the datastream."""
+ knit = self.make_knit_with_4_versions_2_dags()
+ # ask for most permutations, which catches bugs like falling back to the
+ # target knit, or showing ghosts, etc.
+ self.assertIndexVersions(knit, [])
+ self.assertIndexVersions(knit, ['a'])
+ self.assertIndexVersions(knit, ['b'])
+ self.assertIndexVersions(knit, ['c'])
+ self.assertIndexVersions(knit, ['d'])
+ self.assertIndexVersions(knit, ['a', 'b'])
+ self.assertIndexVersions(knit, ['b', 'c'])
+ self.assertIndexVersions(knit, ['a', 'c'])
+ self.assertIndexVersions(knit, ['a', 'b', 'c'])
+ self.assertIndexVersions(knit, ['a', 'b', 'c', 'd'])
+
+ def test_construct(self):
+ """Constructing a StreamIndex generates index data."""
+ data_list = [('text-a', ['fulltext'], 127, []),
+ ('text-b', ['option'], 128, ['text-c'])]
+ index = _StreamIndex(data_list)
+ self.assertEqual({'text-a':(['fulltext'], (0, 127), []),
+ 'text-b':(['option'], (127, 127 + 128), ['text-c'])},
+ index._by_version)
+
+ def test_get_ancestry(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ self.assertIndexAncestry(knit, ['a'], ['a'], ['a'])
+ self.assertIndexAncestry(knit, ['b'], ['b'], ['b'])
+ self.assertIndexAncestry(knit, ['c'], ['c'], ['c'])
+ self.assertIndexAncestry(knit, ['c'], ['a', 'b', 'c'],
+ set(['a', 'b', 'c']))
+ self.assertIndexAncestry(knit, ['c', 'd'], ['a', 'b', 'c', 'd'],
+ set(['a', 'b', 'c', 'd']))
+
+ def test_get_method(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ self.assertGetMethod(knit, ['a'], 'a', 'fulltext')
+ self.assertGetMethod(knit, ['c'], 'c', 'line-delta')
+ # get_method on a basis that is not in the datastream (but in the
+ # backing knit) returns 'fulltext', because thats what we'll create as
+ # we thunk across.
+ self.assertGetMethod(knit, ['c'], 'b', 'fulltext')
+
+ def test_iter_parents(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ self.assertIterParents(knit, ['a'], ['a'], [('a', [])])
+ self.assertIterParents(knit, ['a', 'b'], ['a', 'b'],
+ [('a', []), ('b', [])])
+ self.assertIterParents(knit, ['a', 'b', 'c'], ['a', 'b', 'c'],
+ [('a', []), ('b', []), ('c', ['b', 'a'])])
+ self.assertIterParents(knit, ['a', 'b', 'c', 'd'],
+ ['a', 'b', 'c', 'd'],
+ [('a', []), ('b', []), ('c', ['b', 'a']), ('d', ['e', 'f'])])
+ self.assertIterParents(knit, ['c'], ['a', 'b', 'c'],
+ [('c', ['b', 'a'])])
+
+ def test_get_options(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ self.assertGetOptions(knit, 'a', ['no-eol', 'fulltext'])
+ self.assertGetOptions(knit, 'c', ['line-delta'])
+
+ def test_get_parents_with_ghosts(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ self.assertGetParentsWithGhosts(knit, ['a'], 'a', [])
+ self.assertGetParentsWithGhosts(knit, ['c'], 'c', ['b', 'a'])
+ self.assertGetParentsWithGhosts(knit, ['d'], 'd', ['e', 'f'])
+
+ def test_get_position(self):
+ knit = self.make_knit_with_4_versions_2_dags()
+ # get_position returns (thunk_flag, index(can be None), start, end) for
+ # _StreamAccess to use.
+ self.assertGetPosition(knit, ['a'], 'a', (False, None, 0, 78))
+ self.assertGetPosition(knit, ['a', 'c'], 'c', (False, None, 78, 156))
+ # get_position on a text that is not in the datastream (but in the
+ # backing knit) returns (True, 'versionid', None, None) - and then the
+ # access object can construct the relevant data as needed.
+ self.assertGetPosition(knit, ['a', 'c'], 'b', (True, 'b', None, None))
+
+
+class Test_StreamAccess(KnitTests):
+
+ def get_index_access(self, knit, stream):
+ """Get a _StreamAccess from knit and stream."""
+ knit = knit._knit_from_datastream(stream)
+ return knit._index, knit._data._access
+
+ def assertGetRawRecords(self, knit, versions):
+ index, access = self.get_index_access(knit,
+ knit.get_data_stream(versions))
+ # check that every version asked for can be obtained from the resulting
+ # access object.
+ # batch
+ memos = []
+ for version in versions:
+ memos.append(knit._index.get_position(version))
+ original = {}
+ for version, data in zip(
+ versions, knit._data._access.get_raw_records(memos)):
+ original[version] = data
+ memos = []
+ for version in versions:
+ memos.append(index.get_position(version))
+ streamed = {}
+ for version, data in zip(versions, access.get_raw_records(memos)):
+ streamed[version] = data
+ self.assertEqual(original, streamed)
+ # individually
+ for version in versions:
+ data = list(access.get_raw_records(
+ [index.get_position(version)]))[0]
+ self.assertEqual(original[version], data)
+
+ def make_knit_with_two_versions(self):
+ knit = self.make_test_knit()
+ knit.add_lines('a', [], ["foo"])
+ knit.add_lines('b', [], ["bar"])
+ return knit
+
+ def test_get_raw_records(self):
+ knit = self.make_knit_with_two_versions()
+ self.assertGetRawRecords(knit, ['a', 'b'])
+ self.assertGetRawRecords(knit, ['a'])
+ self.assertGetRawRecords(knit, ['b'])
+
+ def test_get_raw_record_from_backing_knit(self):
+ # the thunk layer should create an artificial A on-demand when needed.
+ source_knit = self.make_test_knit(name='plain', annotate=False)
+ target_knit = self.make_test_knit(name='annotated', annotate=True)
+ source_knit.add_lines("A", [], ["Foo\n"])
+ # Give the target A, so we can try to thunk across to it.
+ target_knit.join(source_knit)
+ index, access = self.get_index_access(target_knit,
+ source_knit.get_data_stream([]))
+ raw_data = list(access.get_raw_records([(True, "A", None, None)]))[0]
+ df = GzipFile(mode='rb', fileobj=StringIO(raw_data))
+ self.assertEqual(
+ 'version A 1 5d36b88bb697a2d778f024048bafabd443d74503\n'
+ 'Foo\nend A\n',
+ df.read())
+
+ def test_asking_for_thunk_stream_is_not_plain_errors(self):
+ knit = self.make_test_knit(name='annotated', annotate=True)
+ knit.add_lines("A", [], ["Foo\n"])
+ index, access = self.get_index_access(knit,
+ knit.get_data_stream([]))
+ self.assertRaises(errors.KnitCorrupt,
+ list, access.get_raw_records([(True, "A", None, None)]))
More information about the bazaar-commits
mailing list