Rev 2772: (Andrew Bennetts) Add get_data_stream, insert_data_stream and get_format_signature to KnitVersionedFile. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Fri Aug 31 03:05:27 BST 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 2772
revision-id: pqm at pqm.ubuntu.com-20070831020510-emrlta5dk6ta95zp
parent: pqm at pqm.ubuntu.com-20070831010401-nyz15lnphtkf3toz
parent: andrew.bennetts at canonical.com-20070831001927-wc63abm0nedokjw3
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2007-08-31 03:05:10 +0100
message:
(Andrew Bennetts) Add get_data_stream, insert_data_stream and get_format_signature to KnitVersionedFile.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/errors.py errors.py-20050309040759-20512168c4e14fbd
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/repofmt/knitrepo.py knitrepo.py-20070206081537-pyy4a00xdas0j4pf-1
bzrlib/tests/test_errors.py test_errors.py-20060210110251-41aba2deddf936a8
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
bzrlib/versionedfile.py versionedfile.py-20060222045106-5039c71ee3b65490
------------------------------------------------------------
revno: 2670.3.8
merged: andrew.bennetts at canonical.com-20070831001927-wc63abm0nedokjw3
parent: andrew.bennetts at canonical.com-20070831001918-rzsxdgqcys6fb8hl
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Fri 2007-08-31 10:19:27 +1000
message:
Add NEWS entry.
------------------------------------------------------------
revno: 2670.3.7
merged: andrew.bennetts at canonical.com-20070831001918-rzsxdgqcys6fb8hl
parent: andrew.bennetts at canonical.com-20070831001009-2p2l7idu1x01mvre
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Fri 2007-08-31 10:19:18 +1000
message:
Tweak docstring as requested in review.
------------------------------------------------------------
revno: 2670.3.6
merged: andrew.bennetts at canonical.com-20070831001009-2p2l7idu1x01mvre
parent: andrew.bennetts at canonical.com-20070830082729-8bue7wh0bqut2xs2
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Fri 2007-08-31 10:10:09 +1000
message:
Remove redundant import.
------------------------------------------------------------
revno: 2670.3.5
merged: andrew.bennetts at canonical.com-20070830082729-8bue7wh0bqut2xs2
parent: andrew.bennetts at canonical.com-20070830081154-16hebp2xwr15x2hc
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Thu 2007-08-30 18:27:29 +1000
message:
Remove get_stream_as_bytes from KnitVersionedFile's API, make it a function in knitrepo.py instead.
------------------------------------------------------------
revno: 2670.3.4
merged: andrew.bennetts at canonical.com-20070830081154-16hebp2xwr15x2hc
parent: andrew.bennetts at canonical.com-20070830080949-yxjvx8fu04g6p9s7
parent: pqm at pqm.ubuntu.com-20070829094547-qm9ntd7pd95r7w8c
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Thu 2007-08-30 18:11:54 +1000
message:
Merge from bzr.dev.
------------------------------------------------------------
revno: 2670.3.3
merged: andrew.bennetts at canonical.com-20070830080949-yxjvx8fu04g6p9s7
parent: andrew.bennetts at canonical.com-20070814060239-o7ouy0ohh9xsz7ri
parent: pqm at pqm.ubuntu.com-20070815055603-t0fwzxv6if6sr7c6
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Thu 2007-08-30 18:09:49 +1000
message:
Merge from bzr.dev.
------------------------------------------------------------
revno: 2670.3.2
merged: andrew.bennetts at canonical.com-20070814060239-o7ouy0ohh9xsz7ri
parent: andrew.bennetts at canonical.com-20070803033525-3pp04fzubrgzlnac
parent: pqm at pqm.ubuntu.com-20070813221757-bianevqddds8ift5
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Tue 2007-08-14 16:02:39 +1000
message:
Merge from bzr.dev.
------------------------------------------------------------
revno: 2670.3.1
merged: andrew.bennetts at canonical.com-20070803033525-3pp04fzubrgzlnac
parent: pqm at pqm.ubuntu.com-20070802221338-9333q05a8caaciwo
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: vf-data-stream
timestamp: Fri 2007-08-03 13:35:25 +1000
message:
Add get_data_stream/insert_data_stream to KnitVersionedFile.
=== modified file 'NEWS'
--- a/NEWS 2007-08-31 00:04:57 +0000
+++ b/NEWS 2007-08-31 02:05:10 +0000
@@ -144,6 +144,12 @@
incremental addition of data to a file without requiring that all the
data be buffered in memory. (Robert Collins)
+ * New methods on ``bzrlib.knit.KnitVersionedFile``:
+ ``get_data_stream(versions)``, ``insert_data_stream(stream)`` and
+ ``get_format_signature()``. These provide some infrastructure for
+ efficiently streaming the knit data for a set of versions over the smart
+ protocol.
+
TESTING:
* Use UTF-8 encoded StringIO for log tests to avoid failures on
=== modified file 'bzrlib/errors.py'
--- a/bzrlib/errors.py 2007-08-29 08:18:22 +0000
+++ b/bzrlib/errors.py 2007-08-30 08:11:54 +0000
@@ -1301,6 +1301,25 @@
internal_error = True
+class KnitCorrupt(KnitError):
+
+ _fmt = "Knit %(filename)s corrupt: %(how)s"
+
+ def __init__(self, filename, how):
+ KnitError.__init__(self)
+ self.filename = filename
+ self.how = how
+
+
+class KnitDataStreamIncompatible(KnitError):
+
+ _fmt = "Cannot insert knit data stream of format \"%(stream_format)s\" into knit of format \"%(target_format)s\"."
+
+ def __init__(self, stream_format, target_format):
+ self.stream_format = stream_format
+ self.target_format = target_format
+
+
class KnitHeaderError(KnitError):
_fmt = 'Knit header error: %(badline)r unexpected for file "%(filename)s".'
@@ -1311,16 +1330,6 @@
self.filename = filename
-class KnitCorrupt(KnitError):
-
- _fmt = "Knit %(filename)s corrupt: %(how)s"
-
- def __init__(self, filename, how):
- KnitError.__init__(self)
- self.filename = filename
- self.how = how
-
-
class KnitIndexUnknownMethod(KnitError):
"""Raised when we don't understand the storage method.
=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py 2007-08-30 23:35:27 +0000
+++ b/bzrlib/knit.py 2007-08-31 02:05:10 +0000
@@ -95,6 +95,7 @@
KnitError,
InvalidRevisionId,
KnitCorrupt,
+ KnitDataStreamIncompatible,
KnitHeaderError,
RevisionNotPresent,
RevisionAlreadyPresent,
@@ -562,6 +563,67 @@
(None, current_values[2], current_values[3]),
new_parents)
+ def get_data_stream(self, required_versions):
+ """Get a data stream for the specified versions.
+
+ Versions may be returned in any order, not necessarily the order
+ specified.
+
+ :param required_versions: The exact set of versions to be extracted.
+ Unlike some other knit methods, this is not used to generate a
+ transitive closure, rather it is used precisely as given.
+
+ :returns: format_signature, list of (version, options, length, parents),
+ reader_callable.
+ """
+ required_versions = set([osutils.safe_revision_id(v) for v in
+ required_versions])
+ # we don't care about inclusions, the caller cares.
+ # but we need to setup a list of records to visit.
+ for version_id in required_versions:
+ if not self.has_version(version_id):
+ raise RevisionNotPresent(version_id, self.filename)
+ # Pick the desired versions out of the index in oldest-to-newest order
+ version_list = []
+ for version_id in self.versions():
+ if version_id in required_versions:
+ version_list.append(version_id)
+
+ # create the list of version information for the result
+ copy_queue_records = []
+ copy_set = set()
+ result_version_list = []
+ for version_id in version_list:
+ options = self._index.get_options(version_id)
+ parents = self._index.get_parents_with_ghosts(version_id)
+ index_memo = self._index.get_position(version_id)
+ copy_queue_records.append((version_id, index_memo))
+ none, data_pos, data_size = index_memo
+ copy_set.add(version_id)
+ # version, options, length, parents
+ result_version_list.append((version_id, options, data_size,
+ parents))
+
+ # Read the compressed record data.
+ # XXX:
+ # From here down to the return should really be logic in the returned
+ # callable -- in a class that adapts read_records_iter_raw to read
+ # requests.
+ raw_datum = []
+ for (version_id, raw_data), \
+ (version_id2, options, _, parents) in \
+ izip(self._data.read_records_iter_raw(copy_queue_records),
+ result_version_list):
+ assert version_id == version_id2, 'logic error, inconsistent results'
+ raw_datum.append(raw_data)
+ pseudo_file = StringIO(''.join(raw_datum))
+ def read(length):
+ if length is None:
+ return pseudo_file.read()
+ else:
+ return pseudo_file.read(length)
+ return (self.get_format_signature(), result_version_list, read)
+
def _extract_blocks(self, version_id, source, target):
if self._index.get_method(version_id) != 'line-delta':
return None
@@ -596,6 +658,14 @@
else:
delta = self.factory.parse_line_delta(data, version_id)
return parent, sha1, noeol, delta
+
+ def get_format_signature(self):
+ """See VersionedFile.get_format_signature()."""
+ if self.factory.annotated:
+ annotated_part = "annotated"
+ else:
+ annotated_part = "plain"
+ return "knit-%s" % (annotated_part,)
def get_graph_with_ghosts(self):
"""See VersionedFile.get_graph_with_ghosts()."""
@@ -632,6 +702,49 @@
return True
return False
+ def insert_data_stream(self, (format, data_list, reader_callable)):
+ """Insert knit records from a data stream into this knit.
+
+ If a version in the stream is already present in this knit, it will not
+ be inserted a second time. It will be checked for consistency with the
+ stored version however, and may cause a KnitCorrupt error to be raised
+ if the data in the stream disagrees with the already stored data.
+
+ :seealso: get_data_stream
+ """
+ if format != self.get_format_signature():
+ trace.mutter('incompatible format signature inserting to %r', self)
+ raise KnitDataStreamIncompatible(
+ format, self.get_format_signature())
+
+ for version_id, options, length, parents in data_list:
+ if self.has_version(version_id):
+ # First check: the list of parents.
+ my_parents = self.get_parents_with_ghosts(version_id)
+ if my_parents != parents:
+ # XXX: KnitCorrupt is not quite the right exception here.
+ raise KnitCorrupt(
+ self.filename,
+ 'parents list %r from data stream does not match '
+ 'already recorded parents %r for %s'
+ % (parents, my_parents, version_id))
+
+ # Also check the SHA-1 of the fulltext this content will
+ # produce.
+ raw_data = reader_callable(length)
+ my_fulltext_sha1 = self.get_sha1(version_id)
+ df, rec = self._data._parse_record_header(version_id, raw_data)
+ stream_fulltext_sha1 = rec[3]
+ if my_fulltext_sha1 != stream_fulltext_sha1:
+ # Actually, we don't know if it's this knit that's corrupt,
+ # or the data stream we're trying to insert.
+ raise KnitCorrupt(
+ self.filename, 'sha-1 does not match %s' % version_id)
+ else:
+ self._add_raw_records(
+ [(version_id, options, parents, length)],
+ reader_callable(length))
+
def versions(self):
"""See VersionedFile.versions."""
if 'evil' in debug.debug_flags:
=== modified file 'bzrlib/repofmt/knitrepo.py'
--- a/bzrlib/repofmt/knitrepo.py 2007-08-07 22:59:45 +0000
+++ b/bzrlib/repofmt/knitrepo.py 2007-08-30 08:27:29 +0000
@@ -37,6 +37,7 @@
import bzrlib.revision as _mod_revision
from bzrlib.store.versioned import VersionedFileStore
from bzrlib.trace import mutter, note, warning
+from bzrlib.util import bencode
class _KnitParentsProvider(object):
@@ -480,3 +481,27 @@
_revision_store=_revision_store,
control_store=control_store,
text_store=text_store)
+
+
+def _get_stream_as_bytes(knit, required_versions):
+ """Generate a serialised data stream.
+
+ The format is a bencoding of a list. The first element of the list is a
+ string of the format signature, then each subsequent element is a list
+ corresponding to a record. Those lists contain:
+
+ * a version id
+ * a list of options
+ * a list of parents
+ * the bytes
+
+ :returns: a bencoded list.
+ """
+ knit_stream = knit.get_data_stream(required_versions)
+ format_signature, data_list, callable = knit_stream
+ data = []
+ data.append(format_signature)
+ for version, options, length, parents in data_list:
+ data.append([version, options, parents, callable(length)])
+ return bencode.bencode(data)
+
=== modified file 'bzrlib/tests/test_errors.py'
--- a/bzrlib/tests/test_errors.py 2007-08-23 18:02:53 +0000
+++ b/bzrlib/tests/test_errors.py 2007-08-30 08:11:54 +0000
@@ -81,6 +81,13 @@
"cannot be broken.",
str(error))
+ def test_knit_data_stream_incompatible(self):
+ error = errors.KnitDataStreamIncompatible(
+ 'stream format', 'target format')
+ self.assertEqual('Cannot insert knit data stream of format '
+ '"stream format" into knit of format '
+ '"target format".', str(error))
+
def test_knit_header_error(self):
error = errors.KnitHeaderError('line foo\n', 'path/to/file')
self.assertEqual("Knit header error: 'line foo\\n' unexpected"
=== modified file 'bzrlib/tests/test_knit.py'
--- a/bzrlib/tests/test_knit.py 2007-08-02 23:43:57 +0000
+++ b/bzrlib/tests/test_knit.py 2007-08-30 08:27:29 +0000
@@ -57,6 +57,7 @@
)
from bzrlib.transport import TransportLogger, get_transport
from bzrlib.transport.memory import MemoryTransport
+from bzrlib.util import bencode
from bzrlib.weave import Weave
@@ -986,15 +987,25 @@
class KnitTests(TestCaseWithTransport):
"""Class containing knit test helper routines."""
- def make_test_knit(self, annotate=False, delay_create=False, index=None):
+ def make_test_knit(self, annotate=False, delay_create=False, index=None,
+ name='test'):
if not annotate:
factory = KnitPlainFactory()
else:
factory = None
- return KnitVersionedFile('test', get_transport('.'), access_mode='w',
+ return KnitVersionedFile(name, get_transport('.'), access_mode='w',
factory=factory, create=True,
delay_create=delay_create, index=index)
+ def assertRecordContentEqual(self, knit, version_id, candidate_content):
+ """Assert that some raw record content matches the raw record content
+ for a particular version_id in the given knit.
+ """
+ index_memo = knit._index.get_position(version_id)
+ record = (version_id, index_memo)
+ [(_, expected_content)] = list(knit._data.read_records_iter_raw([record]))
+ self.assertEqual(expected_content, candidate_content)
+
class BasicKnitTests(KnitTests):
@@ -1458,6 +1469,284 @@
for plan_line, expected_line in zip(plan, AB_MERGE):
self.assertEqual(plan_line, expected_line)
+ def test_get_stream_empty(self):
+ """Get a data stream for an empty knit file."""
+ k1 = self.make_test_knit()
+ format, data_list, reader_callable = k1.get_data_stream([])
+ self.assertEqual('knit-plain', format)
+ self.assertEqual([], data_list)
+ content = reader_callable(None)
+ self.assertEqual('', content)
+ self.assertIsInstance(content, str)
+
+ def test_get_stream_one_version(self):
+ """Get a data stream for a single record out of a knit containing just
+ one record.
+ """
+ k1 = self.make_test_knit()
+ test_data = [
+ ('text-a', [], TEXT_1),
+ ]
+ expected_data_list = [
+ # version, options, length, parents
+ ('text-a', ['fulltext'], 122, []),
+ ]
+ for version_id, parents, lines in test_data:
+ k1.add_lines(version_id, parents, split_lines(lines))
+
+ format, data_list, reader_callable = k1.get_data_stream(['text-a'])
+ self.assertEqual('knit-plain', format)
+ self.assertEqual(expected_data_list, data_list)
+ # There's only one record in the knit, so the content should be the
+ # entire knit data file's contents.
+ self.assertEqual(k1.transport.get_bytes(k1._data._access._filename),
+ reader_callable(None))
+
+ def test_get_stream_get_one_version_of_many(self):
+ """Get a data stream for just one version out of a knit containing many
+ versions.
+ """
+ k1 = self.make_test_knit()
+ # Insert the same data as test_knit_join, as they seem to cover a range
+ # of cases (no parents, one parent, multiple parents).
+ test_data = [
+ ('text-a', [], TEXT_1),
+ ('text-b', ['text-a'], TEXT_1),
+ ('text-c', [], TEXT_1),
+ ('text-d', ['text-c'], TEXT_1),
+ ('text-m', ['text-b', 'text-d'], TEXT_1),
+ ]
+ expected_data_list = [
+ # version, options, length, parents
+ ('text-m', ['line-delta'], 84, ['text-b', 'text-d']),
+ ]
+ for version_id, parents, lines in test_data:
+ k1.add_lines(version_id, parents, split_lines(lines))
+
+ format, data_list, reader_callable = k1.get_data_stream(['text-m'])
+ self.assertEqual('knit-plain', format)
+ self.assertEqual(expected_data_list, data_list)
+ self.assertRecordContentEqual(k1, 'text-m', reader_callable(None))
+
+ def test_get_stream_ghost_parent(self):
+ """Get a data stream for a version with a ghost parent."""
+ k1 = self.make_test_knit()
+ # Test data
+ k1.add_lines('text-a', [], split_lines(TEXT_1))
+ k1.add_lines_with_ghosts('text-b', ['text-a', 'text-ghost'],
+ split_lines(TEXT_1))
+ # Expected data
+ expected_data_list = [
+ # version, options, length, parents
+ ('text-b', ['line-delta'], 84, ['text-a', 'text-ghost']),
+ ]
+
+ format, data_list, reader_callable = k1.get_data_stream(['text-b'])
+ self.assertEqual('knit-plain', format)
+ self.assertEqual(expected_data_list, data_list)
+ self.assertRecordContentEqual(k1, 'text-b', reader_callable(None))
+
+ def test_get_stream_get_multiple_records(self):
+ """Get a stream for multiple records of a knit."""
+ k1 = self.make_test_knit()
+ # Insert the same data as test_knit_join, as they seem to cover a range
+ # of cases (no parents, one parent, multiple parents).
+ test_data = [
+ ('text-a', [], TEXT_1),
+ ('text-b', ['text-a'], TEXT_1),
+ ('text-c', [], TEXT_1),
+ ('text-d', ['text-c'], TEXT_1),
+ ('text-m', ['text-b', 'text-d'], TEXT_1),
+ ]
+ expected_data_list = [
+ # version, options, length, parents
+ ('text-b', ['line-delta'], 84, ['text-a']),
+ ('text-d', ['line-delta'], 84, ['text-c']),
+ ]
+ for version_id, parents, lines in test_data:
+ k1.add_lines(version_id, parents, split_lines(lines))
+
+ # Note that even though we request the revision IDs in a particular
+ # order, the data stream may return them in any order it likes. In this
+ # case, they'll be in the order they were inserted into the knit.
+ format, data_list, reader_callable = k1.get_data_stream(
+ ['text-d', 'text-b'])
+ self.assertEqual('knit-plain', format)
+ self.assertEqual(expected_data_list, data_list)
+ self.assertRecordContentEqual(k1, 'text-b', reader_callable(84))
+ self.assertRecordContentEqual(k1, 'text-d', reader_callable(84))
+ self.assertEqual('', reader_callable(None),
+ "There should be no more bytes left to read.")
+
+ def test_get_stream_all(self):
+ """Get a data stream for all the records in a knit.
+
+ This exercises fulltext records, line-delta records, records with
+ various numbers of parents, and reading multiple records out of the
+ callable. These cases ought to all be exercised individually by the
+ other test_get_stream_* tests; this test is basically just paranoia.
+ """
+ k1 = self.make_test_knit()
+ # Insert the same data as test_knit_join, as they seem to cover a range
+ # of cases (no parents, one parent, multiple parents).
+ test_data = [
+ ('text-a', [], TEXT_1),
+ ('text-b', ['text-a'], TEXT_1),
+ ('text-c', [], TEXT_1),
+ ('text-d', ['text-c'], TEXT_1),
+ ('text-m', ['text-b', 'text-d'], TEXT_1),
+ ]
+ expected_data_list = [
+ # version, options, length, parents
+ ('text-a', ['fulltext'], 122, []),
+ ('text-b', ['line-delta'], 84, ['text-a']),
+ ('text-c', ['fulltext'], 121, []),
+ ('text-d', ['line-delta'], 84, ['text-c']),
+ ('text-m', ['line-delta'], 84, ['text-b', 'text-d']),
+ ]
+ for version_id, parents, lines in test_data:
+ k1.add_lines(version_id, parents, split_lines(lines))
+
+ format, data_list, reader_callable = k1.get_data_stream(
+ ['text-a', 'text-b', 'text-c', 'text-d', 'text-m'])
+ self.assertEqual('knit-plain', format)
+ self.assertEqual(expected_data_list, data_list)
+ for version_id, options, length, parents in expected_data_list:
+ bytes = reader_callable(length)
+ self.assertRecordContentEqual(k1, version_id, bytes)
+
+ def assertKnitFilesEqual(self, knit1, knit2):
+ """Assert that the contents of the index and data files of two knits are
+ equal.
+ """
+ self.assertEqual(
+ knit1.transport.get_bytes(knit1._data._access._filename),
+ knit2.transport.get_bytes(knit2._data._access._filename))
+ self.assertEqual(
+ knit1.transport.get_bytes(knit1._index._filename),
+ knit2.transport.get_bytes(knit2._index._filename))
+
+ def test_insert_data_stream_empty(self):
+ """Inserting a data stream with no records should not put any data into
+ the knit.
+ """
+ k1 = self.make_test_knit()
+ k1.insert_data_stream(
+ (k1.get_format_signature(), [], lambda ignored: ''))
+ self.assertEqual('', k1.transport.get_bytes(k1._data._access._filename),
+ "The .knit should be completely empty.")
+ self.assertEqual(k1._index.HEADER,
+ k1.transport.get_bytes(k1._index._filename),
+ "The .kndx should have nothing apart from the header.")
+
+ def test_insert_data_stream_one_record(self):
+ """Inserting a data stream with one record from a knit with one record
+ results in byte-identical files.
+ """
+ source = self.make_test_knit(name='source')
+ source.add_lines('text-a', [], split_lines(TEXT_1))
+ data_stream = source.get_data_stream(['text-a'])
+
+ target = self.make_test_knit(name='target')
+ target.insert_data_stream(data_stream)
+
+ self.assertKnitFilesEqual(source, target)
+
+ def test_insert_data_stream_records_already_present(self):
+ """Insert a data stream where some records are alreday present in the
+ target, and some not. Only the new records are inserted.
+ """
+ source = self.make_test_knit(name='source')
+ target = self.make_test_knit(name='target')
+ # Insert 'text-a' into both source and target
+ source.add_lines('text-a', [], split_lines(TEXT_1))
+ target.insert_data_stream(source.get_data_stream(['text-a']))
+ # Insert 'text-b' into just the source.
+ source.add_lines('text-b', ['text-a'], split_lines(TEXT_1))
+ # Get a data stream of both text-a and text-b, and insert it.
+ data_stream = source.get_data_stream(['text-a', 'text-b'])
+ target.insert_data_stream(data_stream)
+ # The source and target will now be identical. This means the text-a
+ # record was not added a second time.
+ self.assertKnitFilesEqual(source, target)
+
+ def test_insert_data_stream_multiple_records(self):
+ """Inserting a data stream of all records from a knit with multiple
+ records results in byte-identical files.
+ """
+ source = self.make_test_knit(name='source')
+ source.add_lines('text-a', [], split_lines(TEXT_1))
+ source.add_lines('text-b', ['text-a'], split_lines(TEXT_1))
+ source.add_lines('text-c', [], split_lines(TEXT_1))
+ data_stream = source.get_data_stream(['text-a', 'text-b', 'text-c'])
+
+ target = self.make_test_knit(name='target')
+ target.insert_data_stream(data_stream)
+
+ self.assertKnitFilesEqual(source, target)
+
+ def test_insert_data_stream_ghost_parent(self):
+ """Insert a data stream with a record that has a ghost parent."""
+ # Make a knit with a record, text-a, that has a ghost parent.
+ source = self.make_test_knit(name='source')
+ source.add_lines_with_ghosts('text-a', ['text-ghost'],
+ split_lines(TEXT_1))
+ data_stream = source.get_data_stream(['text-a'])
+
+ target = self.make_test_knit(name='target')
+ target.insert_data_stream(data_stream)
+
+ self.assertKnitFilesEqual(source, target)
+
+ # The target knit object is in a consistent state, i.e. the record we
+ # just added is immediately visible.
+ self.assertTrue(target.has_version('text-a'))
+ self.assertTrue(target.has_ghost('text-ghost'))
+ self.assertEqual(split_lines(TEXT_1), target.get_lines('text-a'))
+
+ def test_insert_data_stream_inconsistent_version_lines(self):
+ """Inserting a data stream which has different content for a version_id
+ than already exists in the knit will raise KnitCorrupt.
+ """
+ source = self.make_test_knit(name='source')
+ target = self.make_test_knit(name='target')
+ # Insert a different 'text-a' into both source and target
+ source.add_lines('text-a', [], split_lines(TEXT_1))
+ target.add_lines('text-a', [], split_lines(TEXT_2))
+ # Insert a data stream with conflicting content into the target
+ data_stream = source.get_data_stream(['text-a'])
+ self.assertRaises(
+ errors.KnitCorrupt, target.insert_data_stream, data_stream)
+
+ def test_insert_data_stream_inconsistent_version_parents(self):
+ """Inserting a data stream which has different parents for a version_id
+ than already exists in the knit will raise KnitCorrupt.
+ """
+ source = self.make_test_knit(name='source')
+ target = self.make_test_knit(name='target')
+ # Insert a different 'text-a' into both source and target. They differ
+ # only by the parents list, the content is the same.
+ source.add_lines_with_ghosts('text-a', [], split_lines(TEXT_1))
+ target.add_lines_with_ghosts('text-a', ['a-ghost'], split_lines(TEXT_1))
+ # Insert a data stream with conflicting content into the target
+ data_stream = source.get_data_stream(['text-a'])
+ self.assertRaises(
+ errors.KnitCorrupt, target.insert_data_stream, data_stream)
+
+ def test_insert_data_stream_incompatible_format(self):
+ """A data stream in a different format to the target knit cannot be
+ inserted.
+
+ It will raise KnitDataStreamIncompatible.
+ """
+ data_stream = ('fake-format-signature', [], lambda _: '')
+ target = self.make_test_knit(name='target')
+ self.assertRaises(
+ errors.KnitDataStreamIncompatible,
+ target.insert_data_stream, data_stream)
+
+ # * test that a stream of "already present version, then new version"
+ # inserts correctly.
TEXT_1 = """\
Banana cup cakes:
=== modified file 'bzrlib/tests/test_repository.py'
--- a/bzrlib/tests/test_repository.py 2007-05-18 11:42:33 +0000
+++ b/bzrlib/tests/test_repository.py 2007-08-30 08:27:29 +0000
@@ -35,9 +35,14 @@
UnsupportedFormatError,
)
from bzrlib.repository import RepositoryFormat
-from bzrlib.tests import TestCase, TestCaseWithTransport
+from bzrlib.tests import (
+ TestCase,
+ TestCaseWithTransport,
+ test_knit,
+ )
from bzrlib.transport import get_transport
from bzrlib.transport.memory import MemoryServer
+from bzrlib.util import bencode
from bzrlib import (
repository,
upgrade,
@@ -338,6 +343,66 @@
self.check_knits(t)
+class KnitRepositoryStreamTests(test_knit.KnitTests):
+ """Tests for knitrepo._get_stream_as_bytes."""
+
+ def test_get_stream_as_bytes(self):
+ # Make a simple knit
+ k1 = self.make_test_knit()
+ k1.add_lines('text-a', [], test_knit.split_lines(test_knit.TEXT_1))
+
+ # Serialise it, check the output.
+ bytes = knitrepo._get_stream_as_bytes(k1, ['text-a'])
+ data = bencode.bdecode(bytes)
+ format, record = data
+ self.assertEqual('knit-plain', format)
+ self.assertEqual(['text-a', ['fulltext'], []], record[:3])
+ self.assertRecordContentEqual(k1, 'text-a', record[3])
+
+ def test_get_stream_as_bytes_all(self):
+ """Get a serialised data stream for all the records in a knit.
+
+ Much like test_get_stream_all, except for get_stream_as_bytes.
+ """
+ k1 = self.make_test_knit()
+ # Insert the same data as BasicKnitTests.test_knit_join, as they seem
+ # to cover a range of cases (no parents, one parent, multiple parents).
+ test_data = [
+ ('text-a', [], test_knit.TEXT_1),
+ ('text-b', ['text-a'], test_knit.TEXT_1),
+ ('text-c', [], test_knit.TEXT_1),
+ ('text-d', ['text-c'], test_knit.TEXT_1),
+ ('text-m', ['text-b', 'text-d'], test_knit.TEXT_1),
+ ]
+ expected_data_list = [
+ # version, options, parents
+ ('text-a', ['fulltext'], []),
+ ('text-b', ['line-delta'], ['text-a']),
+ ('text-c', ['fulltext'], []),
+ ('text-d', ['line-delta'], ['text-c']),
+ ('text-m', ['line-delta'], ['text-b', 'text-d']),
+ ]
+ for version_id, parents, lines in test_data:
+ k1.add_lines(version_id, parents, test_knit.split_lines(lines))
+
+ bytes = knitrepo._get_stream_as_bytes(
+ k1, ['text-a', 'text-b', 'text-c', 'text-d', 'text-m'])
+
+ data = bencode.bdecode(bytes)
+ format = data.pop(0)
+ self.assertEqual('knit-plain', format)
+
+ for expected, actual in zip(expected_data_list, data):
+ expected_version = expected[0]
+ expected_options = expected[1]
+ expected_parents = expected[2]
+ version, options, parents, bytes = actual
+ self.assertEqual(expected_version, version)
+ self.assertEqual(expected_options, options)
+ self.assertEqual(expected_parents, parents)
+ self.assertRecordContentEqual(k1, version, bytes)
+
+
class DummyRepository(object):
"""A dummy repository for testing."""
=== modified file 'bzrlib/tests/test_versionedfile.py'
--- a/bzrlib/tests/test_versionedfile.py 2007-08-15 06:46:33 +0000
+++ b/bzrlib/tests/test_versionedfile.py 2007-08-30 08:11:54 +0000
@@ -35,8 +35,8 @@
WeaveParentMismatch
)
from bzrlib.knit import KnitVersionedFile, \
- KnitAnnotateFactory
-from bzrlib.tests import TestCaseWithTransport, TestSkipped
+ KnitAnnotateFactory, KnitPlainFactory
+from bzrlib.tests import TestCaseWithMemoryTransport, TestSkipped
from bzrlib.tests.HTTPTestUtil import TestCaseWithWebserver
from bzrlib.trace import mutter
from bzrlib.transport import get_transport
@@ -826,7 +826,7 @@
vf.get_sha1s(['a', 'c', 'b']))
-class TestWeave(TestCaseWithTransport, VersionedFileTestMixIn):
+class TestWeave(TestCaseWithMemoryTransport, VersionedFileTestMixIn):
def get_file(self, name='foo'):
return WeaveFile(name, get_transport(self.get_url('.')), create=True)
@@ -878,7 +878,7 @@
return WeaveFile
-class TestKnit(TestCaseWithTransport, VersionedFileTestMixIn):
+class TestKnit(TestCaseWithMemoryTransport, VersionedFileTestMixIn):
def get_file(self, name='foo'):
return KnitVersionedFile(name, get_transport(self.get_url('.')),
@@ -927,7 +927,7 @@
# if we make the registry a separate class though we still need to
# test the behaviour in the active registry to catch failure-to-handle-
# stange-objects
-class TestInterVersionedFile(TestCaseWithTransport):
+class TestInterVersionedFile(TestCaseWithMemoryTransport):
def test_get_default_inter_versionedfile(self):
# test that the InterVersionedFile.get(a, b) probes
@@ -1247,7 +1247,7 @@
self._test_merge_from_strings(base, a, b, result)
-class TestKnitMerge(TestCaseWithTransport, MergeCasesMixin):
+class TestKnitMerge(TestCaseWithMemoryTransport, MergeCasesMixin):
def get_file(self, name='foo'):
return KnitVersionedFile(name, get_transport(self.get_url('.')),
@@ -1257,7 +1257,7 @@
pass
-class TestWeaveMerge(TestCaseWithTransport, MergeCasesMixin):
+class TestWeaveMerge(TestCaseWithMemoryTransport, MergeCasesMixin):
def get_file(self, name='foo'):
return WeaveFile(name, get_transport(self.get_url('.')), create=True)
@@ -1270,3 +1270,23 @@
overlappedInsertExpected = ['aaa', '<<<<<<< ', 'xxx', 'yyy', '=======',
'xxx', '>>>>>>> ', 'bbb']
+
+
+class TestFormatSignatures(TestCaseWithMemoryTransport):
+
+ def get_knit_file(self, name, annotated):
+ if annotated:
+ factory = KnitAnnotateFactory()
+ else:
+ factory = KnitPlainFactory()
+ return KnitVersionedFile(
+ name, get_transport(self.get_url('.')), create=True,
+ factory=factory)
+
+ def test_knit_format_signatures(self):
+ """Different formats of knit have different signature strings."""
+ knit = self.get_knit_file('a', True)
+ self.assertEqual('knit-annotated', knit.get_format_signature())
+ knit = self.get_knit_file('p', False)
+ self.assertEqual('knit-plain', knit.get_format_signature())
+
=== modified file 'bzrlib/versionedfile.py'
--- a/bzrlib/versionedfile.py 2007-08-15 06:46:33 +0000
+++ b/bzrlib/versionedfile.py 2007-08-30 08:11:54 +0000
@@ -261,6 +261,13 @@
result[version_id] = self.get_delta(version_id)
return result
+ def get_format_signature(self):
+ """Get a text description of the data encoding in this file.
+
+ :since: 0.19
+ """
+ raise NotImplementedError(self.get_format_signature)
+
def make_mpdiffs(self, version_ids):
"""Create multiparent diffs for specified versions"""
knit_versions = set()
More information about the bazaar-commits
mailing list