Rev 2981: (robertc) Streaming-friendly container APIs. (Andrew Bennetts) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Tue Nov 13 00:51:44 GMT 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 2981
revision-id: pqm at pqm.ubuntu.com-20071113005140-mp4owdlrd1ccnqc9
parent: pqm at pqm.ubuntu.com-20071112212307-eusj64ymto8l9abk
parent: andrew.bennetts at canonical.com-20071110145614-bxpv031fq77cjg5j
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-11-13 00:51:40 +0000
message:
(robertc) Streaming-friendly container APIs. (Andrew Bennetts)
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/pack.py container.py-20070607160755-tr8zc26q18rn0jnb-1
bzrlib/tests/test_pack.py test_container.py-20070607160755-tr8zc26q18rn0jnb-2
------------------------------------------------------------
revno: 2916.2.17
merged: andrew.bennetts at canonical.com-20071110145614-bxpv031fq77cjg5j
parent: andrew.bennetts at canonical.com-20071110145313-gqkdu436xswrhpcu
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Sat 2007-11-10 09:56:14 -0500
message:
Add NEWS entry.
------------------------------------------------------------
revno: 2916.2.16
merged: andrew.bennetts at canonical.com-20071110145313-gqkdu436xswrhpcu
parent: andrew.bennetts at canonical.com-20071110144920-dm24jrmdjp9nx856
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Sat 2007-11-10 09:53:13 -0500
message:
Undo change to bundle writing; that belongs in a separate branch.
------------------------------------------------------------
revno: 2916.2.15
merged: andrew.bennetts at canonical.com-20071110144920-dm24jrmdjp9nx856
parent: andrew.bennetts at canonical.com-20071110144838-mhsih72mackm8vad
parent: pqm at pqm.ubuntu.com-20071109154145-1yq4oi390uk3z90o
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Sat 2007-11-10 09:49:20 -0500
message:
Merge from bzr.dev.
------------------------------------------------------------
revno: 2916.2.14
merged: andrew.bennetts at canonical.com-20071110144838-mhsih72mackm8vad
parent: andrew.bennetts at canonical.com-20071029083438-ke1vsv97dvgrvup5
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Sat 2007-11-10 09:48:38 -0500
message:
Add a docstring.
------------------------------------------------------------
revno: 2916.2.13
merged: andrew.bennetts at canonical.com-20071029083438-ke1vsv97dvgrvup5
parent: andrew.bennetts at canonical.com-20071029061151-s542b9ort6ph7vg1
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Mon 2007-10-29 19:34:38 +1100
message:
Improve some docstrings.
------------------------------------------------------------
revno: 2916.2.12
merged: andrew.bennetts at canonical.com-20071029061151-s542b9ort6ph7vg1
parent: andrew.bennetts at canonical.com-20071029060352-ts5x6nkb7gocv7ga
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Mon 2007-10-29 17:11:51 +1100
message:
Refactor TestContainerWriter to be a little more concise.
------------------------------------------------------------
revno: 2916.2.11
merged: andrew.bennetts at canonical.com-20071029060352-ts5x6nkb7gocv7ga
parent: andrew.bennetts at canonical.com-20071025101107-ycrog1cjenpwdye1
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Mon 2007-10-29 17:03:52 +1100
message:
Add direct unit tests for ContainerSerialiser.
------------------------------------------------------------
revno: 2916.2.10
merged: andrew.bennetts at canonical.com-20071025101107-ycrog1cjenpwdye1
parent: andrew.bennetts at canonical.com-20071025101042-dgoqmkisttgbsrwj
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Thu 2007-10-25 20:11:07 +1000
message:
Simpler iter_records_from_file implementation.
------------------------------------------------------------
revno: 2916.2.9
merged: andrew.bennetts at canonical.com-20071025101042-dgoqmkisttgbsrwj
parent: andrew.bennetts at canonical.com-20071025094846-3n4h4joh38k9fqat
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Thu 2007-10-25 20:10:42 +1000
message:
Turn an EOFError from bz2 decompressor into StopIteration.
------------------------------------------------------------
revno: 2916.2.8
merged: andrew.bennetts at canonical.com-20071025094846-3n4h4joh38k9fqat
parent: andrew.bennetts at canonical.com-20071025063825-m1nec9q2h3gp5qv1
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Thu 2007-10-25 19:48:46 +1000
message:
Add bzrlib.pack.iter_records_from_file.
------------------------------------------------------------
revno: 2916.2.7
merged: andrew.bennetts at canonical.com-20071025063825-m1nec9q2h3gp5qv1
parent: andrew.bennetts at canonical.com-20071022051209-qyq4vnvw447hzxvl
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Thu 2007-10-25 16:38:25 +1000
message:
Remove duplicated test case classes, presumably added by a copy-and-paste mistake.
------------------------------------------------------------
revno: 2916.2.6
merged: andrew.bennetts at canonical.com-20071022051209-qyq4vnvw447hzxvl
parent: andrew.bennetts at canonical.com-20071022050512-251thgr0i2ntcq3j
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Mon 2007-10-22 15:12:09 +1000
message:
Better docstrings.
------------------------------------------------------------
revno: 2916.2.5
merged: andrew.bennetts at canonical.com-20071022050512-251thgr0i2ntcq3j
parent: andrew.bennetts at canonical.com-20071022045017-rxuvzs4slk4pbprs
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Mon 2007-10-22 15:05:12 +1000
message:
Extract a ContainerSerialiser class from ContainerWriter.
------------------------------------------------------------
revno: 2916.2.4
merged: andrew.bennetts at canonical.com-20071022045017-rxuvzs4slk4pbprs
parent: andrew.bennetts at canonical.com-20071021042958-rupaqotpkhp32pe7
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Mon 2007-10-22 14:50:17 +1000
message:
Extract a _serialise_byte_records function.
------------------------------------------------------------
revno: 2916.2.3
merged: andrew.bennetts at canonical.com-20071021042958-rupaqotpkhp32pe7
parent: andrew.bennetts at canonical.com-20071019084206-5q81fn1u5rsbnwln
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Sun 2007-10-21 14:29:58 +1000
message:
Remove debug mutter.
------------------------------------------------------------
revno: 2916.2.2
merged: andrew.bennetts at canonical.com-20071019084206-5q81fn1u5rsbnwln
parent: andrew.bennetts at canonical.com-20071019083714-h5k2lcltixga71kx
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Fri 2007-10-19 18:42:06 +1000
message:
Add a couple of docstrings to the tests.
------------------------------------------------------------
revno: 2916.2.1
merged: andrew.bennetts at canonical.com-20071019083714-h5k2lcltixga71kx
parent: pqm at pqm.ubuntu.com-20071018040514-3hc1k2nj1umg3tig
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: streamable-containers
timestamp: Fri 2007-10-19 18:37:14 +1000
message:
Initial implementation of a 'push' parser for the container format.
=== modified file 'NEWS'
--- a/NEWS 2007-11-09 19:50:36 +0000
+++ b/NEWS 2007-11-13 00:51:40 +0000
@@ -80,6 +80,11 @@
INTERNALS:
+ * Added ``ContainerSerialiser`` and ``ContainerPushParser`` to
+ ``bzrlib.pack``. These classes provide more convenient APIs for generating
+ and parsing containers from streams rather than from files. (Andrew
+ Bennetts)
+
TESTING:
=== modified file 'bzrlib/pack.py'
--- a/bzrlib/pack.py 2007-10-05 05:52:45 +0000
+++ b/bzrlib/pack.py 2007-11-10 14:48:38 +0000
@@ -16,7 +16,8 @@
"""Container format for Bazaar data.
-"Containers" and "records" are described in doc/developers/container-format.txt.
+"Containers" and "records" are described in
+doc/developers/container-format.txt.
"""
from cStringIO import StringIO
@@ -58,8 +59,54 @@
raise errors.InvalidRecordError(str(e))
+class ContainerSerialiser(object):
+ """A helper class for serialising containers.
+
+ It simply returns bytes from method calls to 'begin', 'end' and
+ 'bytes_record'. You may find ContainerWriter to be a more convenient
+ interface.
+ """
+
+ def begin(self):
+ """Return the bytes to begin a container."""
+ return FORMAT_ONE + "\n"
+
+ def end(self):
+ """Return the bytes to finish a container."""
+ return "E"
+
+ def bytes_record(self, bytes, names):
+ """Return the bytes for a Bytes record with the given name and
+ contents.
+ """
+ # Kind marker
+ byte_sections = ["B"]
+ # Length
+ byte_sections.append(str(len(bytes)) + "\n")
+ # Names
+ for name_tuple in names:
+ # Make sure we're writing valid names. Note that we will leave a
+ # half-written record if a name is bad!
+ for name in name_tuple:
+ _check_name(name)
+ byte_sections.append('\x00'.join(name_tuple) + "\n")
+ # End of headers
+ byte_sections.append("\n")
+ # Finally, the contents.
+ byte_sections.append(bytes)
+ # XXX: This causes a memory copy of bytes in size, but is usually
+ # faster than two write calls (12 vs 13 seconds to output a gig of
+ # 1k records.) - results may differ on significantly larger records
+ # like .iso's but as they should be rare in any case and thus not
+ # likely to be the common case. The biggest issue is causing extreme
+ # memory pressure in that case. One possibly improvement here is to
+ # check the size of the content before deciding to join here vs call
+ # write twice.
+ return ''.join(byte_sections)
+
+
class ContainerWriter(object):
- """A class for writing containers.
+ """A class for writing containers to a file.
:attribute records_written: The number of user records added to the
container. This does not count the prelude or suffix of the container
@@ -75,10 +122,11 @@
self._write_func = write_func
self.current_offset = 0
self.records_written = 0
+ self._serialiser = ContainerSerialiser()
def begin(self):
"""Begin writing a container."""
- self.write_func(FORMAT_ONE + "\n")
+ self.write_func(self._serialiser.begin())
def write_func(self, bytes):
self._write_func(bytes)
@@ -86,7 +134,7 @@
def end(self):
"""Finish writing a container."""
- self.write_func("E")
+ self.write_func(self._serialiser.end())
def add_bytes_record(self, bytes, names):
"""Add a Bytes record with the given names.
@@ -103,30 +151,8 @@
and thus are only suitable for use by a ContainerReader.
"""
current_offset = self.current_offset
- # Kind marker
- byte_sections = ["B"]
- # Length
- byte_sections.append(str(len(bytes)) + "\n")
- # Names
- for name_tuple in names:
- # Make sure we're writing valid names. Note that we will leave a
- # half-written record if a name is bad!
- for name in name_tuple:
- _check_name(name)
- byte_sections.append('\x00'.join(name_tuple) + "\n")
- # End of headers
- byte_sections.append("\n")
- # Finally, the contents.
- byte_sections.append(bytes)
- # XXX: This causes a memory copy of bytes in size, but is usually
- # faster than two write calls (12 vs 13 seconds to output a gig of
- # 1k records.) - results may differ on significantly larger records
- # like .iso's but as they should be rare in any case and thus not
- # likely to be the common case. The biggest issue is causing extreme
- # memory pressure in that case. One possibly improvement here is to
- # check the size of the content before deciding to join here vs call
- # write twice.
- self.write_func(''.join(byte_sections))
+ serialised_record = self._serialiser.bytes_record(bytes, names)
+ self.write_func(serialised_record)
self.records_written += 1
# return a memo of where we wrote data to allow random access.
return current_offset, self.current_offset - current_offset
@@ -355,3 +381,123 @@
_check_name_encoding(name)
read_bytes(None)
+
+class ContainerPushParser(object):
+ """A "push" parser for container format 1.
+
+ It accepts bytes via the ``accept_bytes`` method, and parses them into
+ records which can be retrieved via the ``read_pending_records`` method.
+ """
+
+ def __init__(self):
+ self._buffer = ''
+ self._state_handler = self._state_expecting_format_line
+ self._parsed_records = []
+ self._reset_current_record()
+ self.finished = False
+
+ def _reset_current_record(self):
+ self._current_record_length = None
+ self._current_record_names = []
+
+ def accept_bytes(self, bytes):
+ self._buffer += bytes
+ # Keep iterating the state machine until it stops consuming bytes from
+ # the buffer.
+ last_buffer_length = None
+ cur_buffer_length = len(self._buffer)
+ while cur_buffer_length != last_buffer_length:
+ last_buffer_length = cur_buffer_length
+ self._state_handler()
+ cur_buffer_length = len(self._buffer)
+
+ def read_pending_records(self):
+ records = self._parsed_records
+ self._parsed_records = []
+ return records
+
+ def _consume_line(self):
+ """Take a line out of the buffer, and return the line.
+
+ If a newline byte is not found in the buffer, the buffer is
+ unchanged and this returns None instead.
+ """
+ newline_pos = self._buffer.find('\n')
+ if newline_pos != -1:
+ line = self._buffer[:newline_pos]
+ self._buffer = self._buffer[newline_pos+1:]
+ return line
+ else:
+ return None
+
+ def _state_expecting_format_line(self):
+ line = self._consume_line()
+ if line is not None:
+ if line != FORMAT_ONE:
+ raise errors.UnknownContainerFormatError(line)
+ self._state_handler = self._state_expecting_record_type
+
+ def _state_expecting_record_type(self):
+ if len(self._buffer) >= 1:
+ record_type = self._buffer[0]
+ self._buffer = self._buffer[1:]
+ if record_type == 'B':
+ self._state_handler = self._state_expecting_length
+ elif record_type == 'E':
+ self.finished = True
+ self._state_handler = self._state_expecting_nothing
+ else:
+ raise errors.UnknownRecordTypeError(record_type)
+
+ def _state_expecting_length(self):
+ line = self._consume_line()
+ if line is not None:
+ try:
+ self._current_record_length = int(line)
+ except ValueError:
+ raise errors.InvalidRecordError(
+ "%r is not a valid length." % (line,))
+ self._state_handler = self._state_expecting_name
+
+ def _state_expecting_name(self):
+ encoded_name_parts = self._consume_line()
+ if encoded_name_parts == '':
+ self._state_handler = self._state_expecting_body
+ elif encoded_name_parts:
+ name_parts = tuple(encoded_name_parts.split('\x00'))
+ for name_part in name_parts:
+ _check_name(name_part)
+ self._current_record_names.append(name_parts)
+
+ def _state_expecting_body(self):
+ if len(self._buffer) >= self._current_record_length:
+ body_bytes = self._buffer[:self._current_record_length]
+ self._buffer = self._buffer[self._current_record_length:]
+ record = (self._current_record_names, body_bytes)
+ self._parsed_records.append(record)
+ self._reset_current_record()
+ self._state_handler = self._state_expecting_record_type
+
+ def _state_expecting_nothing(self):
+ pass
+
+ def read_size_hint(self):
+ hint = 16384
+ if self._state_handler == self._state_expecting_body:
+ remaining = self._current_record_length - len(self._buffer)
+ if remaining < 0:
+ remaining = 0
+ return max(hint, remaining)
+ return hint
+
+
+def iter_records_from_file(source_file):
+ parser = ContainerPushParser()
+ while True:
+ bytes = source_file.read(parser.read_size_hint())
+ parser.accept_bytes(bytes)
+ for record in parser.read_pending_records():
+ yield record
+ if parser.finished:
+ break
+
=== modified file 'bzrlib/tests/test_pack.py'
--- a/bzrlib/tests/test_pack.py 2007-08-28 05:17:06 +0000
+++ b/bzrlib/tests/test_pack.py 2007-10-29 08:34:38 +0000
@@ -22,151 +22,181 @@
from bzrlib import pack, errors, tests
+class TestContainerSerialiser(tests.TestCase):
+ """Tests for the ContainerSerialiser class."""
+
+ def test_construct(self):
+ """Test constructing a ContainerSerialiser."""
+ pack.ContainerSerialiser()
+
+ def test_begin(self):
+ serialiser = pack.ContainerSerialiser()
+ self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\n',
+ serialiser.begin())
+
+ def test_end(self):
+ serialiser = pack.ContainerSerialiser()
+ self.assertEqual('E', serialiser.end())
+
+ def test_bytes_record_no_name(self):
+ serialiser = pack.ContainerSerialiser()
+ record = serialiser.bytes_record('bytes', [])
+ self.assertEqual('B5\n\nbytes', record)
+
+ def test_bytes_record_one_name_with_one_part(self):
+ serialiser = pack.ContainerSerialiser()
+ record = serialiser.bytes_record('bytes', [('name',)])
+ self.assertEqual('B5\nname\n\nbytes', record)
+
+ def test_bytes_record_one_name_with_two_parts(self):
+ serialiser = pack.ContainerSerialiser()
+ record = serialiser.bytes_record('bytes', [('part1', 'part2')])
+ self.assertEqual('B5\npart1\x00part2\n\nbytes', record)
+
+ def test_bytes_record_two_names(self):
+ serialiser = pack.ContainerSerialiser()
+ record = serialiser.bytes_record('bytes', [('name1',), ('name2',)])
+ self.assertEqual('B5\nname1\nname2\n\nbytes', record)
+
+ def test_bytes_record_whitespace_in_name_part(self):
+ serialiser = pack.ContainerSerialiser()
+ self.assertRaises(
+ errors.InvalidRecordError,
+ serialiser.bytes_record, 'bytes', [('bad name',)])
+
+
class TestContainerWriter(tests.TestCase):
+ def setUp(self):
+ self.output = StringIO()
+ self.writer = pack.ContainerWriter(self.output.write)
+
+ def assertOutput(self, expected_output):
+ """Assert that the output of self.writer ContainerWriter is equal to
+ expected_output.
+ """
+ self.assertEqual(expected_output, self.output.getvalue())
+
def test_construct(self):
"""Test constructing a ContainerWriter.
- This uses None as the output stream to show that the constructor doesn't
- try to use the output stream.
+ This uses None as the output stream to show that the constructor
+ doesn't try to use the output stream.
"""
writer = pack.ContainerWriter(None)
def test_begin(self):
"""The begin() method writes the container format marker line."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\n',
- output.getvalue())
+ self.writer.begin()
+ self.assertOutput('Bazaar pack format 1 (introduced in 0.18)\n')
def test_zero_records_written_after_begin(self):
"""After begin is written, 0 records have been written."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- self.assertEqual(0, writer.records_written)
+ self.writer.begin()
+ self.assertEqual(0, self.writer.records_written)
def test_end(self):
"""The end() method writes an End Marker record."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- writer.end()
- self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\nE',
- output.getvalue())
+ self.writer.begin()
+ self.writer.end()
+ self.assertOutput('Bazaar pack format 1 (introduced in 0.18)\nE')
def test_empty_end_does_not_add_a_record_to_records_written(self):
"""The end() method does not count towards the records written."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- writer.end()
- self.assertEqual(0, writer.records_written)
+ self.writer.begin()
+ self.writer.end()
+ self.assertEqual(0, self.writer.records_written)
def test_non_empty_end_does_not_add_a_record_to_records_written(self):
"""The end() method does not count towards the records written."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- writer.add_bytes_record('foo', names=[])
- writer.end()
- self.assertEqual(1, writer.records_written)
+ self.writer.begin()
+ self.writer.add_bytes_record('foo', names=[])
+ self.writer.end()
+ self.assertEqual(1, self.writer.records_written)
def test_add_bytes_record_no_name(self):
"""Add a bytes record with no name."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- offset, length = writer.add_bytes_record('abc', names=[])
+ self.writer.begin()
+ offset, length = self.writer.add_bytes_record('abc', names=[])
self.assertEqual((42, 7), (offset, length))
- self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\nB3\n\nabc',
- output.getvalue())
+ self.assertOutput(
+ 'Bazaar pack format 1 (introduced in 0.18)\nB3\n\nabc')
def test_add_bytes_record_one_name(self):
"""Add a bytes record with one name."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- offset, length = writer.add_bytes_record('abc', names=[('name1', )])
+ self.writer.begin()
+ offset, length = self.writer.add_bytes_record(
+ 'abc', names=[('name1', )])
self.assertEqual((42, 13), (offset, length))
- self.assertEqual(
- 'Bazaar pack format 1 (introduced in 0.18)\n'
- 'B3\nname1\n\nabc',
- output.getvalue())
-
- def test_add_bytes_record_two_names(self):
- """Add a bytes record with two names."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- offset, length = writer.add_bytes_record('abc', names=[('name1', ), ('name2', )])
- self.assertEqual((42, 19), (offset, length))
- self.assertEqual(
- 'Bazaar pack format 1 (introduced in 0.18)\n'
- 'B3\nname1\nname2\n\nabc',
- output.getvalue())
-
- def test_add_bytes_record_two_names(self):
- """Add a bytes record with two names."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- offset, length = writer.add_bytes_record('abc', names=[('name1', ), ('name2', )])
- self.assertEqual((42, 19), (offset, length))
- self.assertEqual(
- 'Bazaar pack format 1 (introduced in 0.18)\n'
- 'B3\nname1\nname2\n\nabc',
- output.getvalue())
+ self.assertOutput(
+ 'Bazaar pack format 1 (introduced in 0.18)\n'
+ 'B3\nname1\n\nabc')
+
+ def test_add_bytes_record_two_names(self):
+ """Add a bytes record with two names."""
+ self.writer.begin()
+ offset, length = self.writer.add_bytes_record(
+ 'abc', names=[('name1', ), ('name2', )])
+ self.assertEqual((42, 19), (offset, length))
+ self.assertOutput(
+ 'Bazaar pack format 1 (introduced in 0.18)\n'
+ 'B3\nname1\nname2\n\nabc')
+
+ def test_add_bytes_record_two_names(self):
+ """Add a bytes record with two names."""
+ self.writer.begin()
+ offset, length = self.writer.add_bytes_record(
+ 'abc', names=[('name1', ), ('name2', )])
+ self.assertEqual((42, 19), (offset, length))
+ self.assertOutput(
+ 'Bazaar pack format 1 (introduced in 0.18)\n'
+ 'B3\nname1\nname2\n\nabc')
def test_add_bytes_record_two_element_name(self):
"""Add a bytes record with a two-element name."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- offset, length = writer.add_bytes_record('abc', names=[('name1', 'name2')])
+ self.writer.begin()
+ offset, length = self.writer.add_bytes_record(
+ 'abc', names=[('name1', 'name2')])
self.assertEqual((42, 19), (offset, length))
- self.assertEqual(
+ self.assertOutput(
'Bazaar pack format 1 (introduced in 0.18)\n'
- 'B3\nname1\x00name2\n\nabc',
- output.getvalue())
+ 'B3\nname1\x00name2\n\nabc')
def test_add_second_bytes_record_gets_higher_offset(self):
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- writer.add_bytes_record('abc', names=[])
- offset, length = writer.add_bytes_record('abc', names=[])
+ self.writer.begin()
+ self.writer.add_bytes_record('abc', names=[])
+ offset, length = self.writer.add_bytes_record('abc', names=[])
self.assertEqual((49, 7), (offset, length))
- self.assertEqual(
+ self.assertOutput(
'Bazaar pack format 1 (introduced in 0.18)\n'
'B3\n\nabc'
- 'B3\n\nabc',
- output.getvalue())
+ 'B3\n\nabc')
def test_add_bytes_record_invalid_name(self):
"""Adding a Bytes record with a name with whitespace in it raises
InvalidRecordError.
"""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
+ self.writer.begin()
self.assertRaises(
errors.InvalidRecordError,
- writer.add_bytes_record, 'abc', names=[('bad name', )])
+ self.writer.add_bytes_record, 'abc', names=[('bad name', )])
def test_add_bytes_records_add_to_records_written(self):
"""Adding a Bytes record increments the records_written counter."""
- output = StringIO()
- writer = pack.ContainerWriter(output.write)
- writer.begin()
- writer.add_bytes_record('foo', names=[])
- self.assertEqual(1, writer.records_written)
- writer.add_bytes_record('foo', names=[])
- self.assertEqual(2, writer.records_written)
+ self.writer.begin()
+ self.writer.add_bytes_record('foo', names=[])
+ self.assertEqual(1, self.writer.records_written)
+ self.writer.add_bytes_record('foo', names=[])
+ self.assertEqual(2, self.writer.records_written)
class TestContainerReader(tests.TestCase):
+ """Tests for the ContainerReader.
+
+ The ContainerReader reads format 1 containers, so these tests explicitly
+ test how it reacts to format 1 data. If a new version of the format is
+ added, then separate tests for that format should be added.
+ """
def get_reader_for(self, bytes):
stream = StringIO(bytes)
@@ -299,7 +329,13 @@
class TestBytesRecordReader(tests.TestCase):
- """Tests for reading and validating Bytes records with BytesRecordReader."""
+ """Tests for reading and validating Bytes records with
+ BytesRecordReader.
+
+ Like TestContainerReader, this explicitly tests the reading of format 1
+ data. If a new version of the format is added, then a separate set of
+ tests for reading that format should be added.
+ """
def get_reader_for(self, bytes):
stream = StringIO(bytes)
@@ -523,3 +559,137 @@
results.append(f.readline())
results.append(f.read(4))
self.assertEqual(['0', '\n', '2\n4\n'], results)
+
+
+class PushParserTestCase(tests.TestCase):
+ """Base class for TestCases involving ContainerPushParser."""
+
+ def make_parser_expecting_record_type(self):
+ parser = pack.ContainerPushParser()
+ parser.accept_bytes("Bazaar pack format 1 (introduced in 0.18)\n")
+ return parser
+
+ def make_parser_expecting_bytes_record(self):
+ parser = pack.ContainerPushParser()
+ parser.accept_bytes("Bazaar pack format 1 (introduced in 0.18)\nB")
+ return parser
+
+ def assertRecordParsing(self, expected_record, bytes):
+ """Assert that 'bytes' is parsed as a given bytes record.
+
+ :param expected_record: A tuple of (names, bytes).
+ """
+ parser = self.make_parser_expecting_bytes_record()
+ parser.accept_bytes(bytes)
+ parsed_records = parser.read_pending_records()
+ self.assertEqual([expected_record], parsed_records)
+
+
+class TestContainerPushParser(PushParserTestCase):
+ """Tests for ContainerPushParser.
+
+ The ContainerPushParser reads format 1 containers, so these tests
+ explicitly test how it reacts to format 1 data. If a new version of the
+ format is added, then separate tests for that format should be added.
+ """
+
+ def test_construct(self):
+ """ContainerPushParser can be constructed."""
+ pack.ContainerPushParser()
+
+ def test_multiple_records_at_once(self):
+ """If multiple records worth of data are fed to the parser in one
+ string, the parser will correctly parse all the records.
+
+ (A naive implementation might stop after parsing the first record.)
+ """
+ parser = self.make_parser_expecting_record_type()
+ parser.accept_bytes("B5\nname1\n\nbody1B5\nname2\n\nbody2")
+ self.assertEqual(
+ [([('name1',)], 'body1'), ([('name2',)], 'body2')],
+ parser.read_pending_records())
+
+
+class TestContainerPushParserBytesParsing(PushParserTestCase):
+ """Tests for reading Bytes records with ContainerPushParser.
+
+ The ContainerPushParser reads format 1 containers, so these tests
+ explicitly test how it reacts to format 1 data. If a new version of the
+ format is added, then separate tests for that format should be added.
+ """
+
+ def test_record_with_no_name(self):
+ """Reading a Bytes record with no name returns an empty list of
+ names.
+ """
+ self.assertRecordParsing(([], 'aaaaa'), "5\n\naaaaa")
+
+ def test_record_with_one_name(self):
+ """Reading a Bytes record with one name returns a list of just that
+ name.
+ """
+ self.assertRecordParsing(
+ ([('name1', )], 'aaaaa'),
+ "5\nname1\n\naaaaa")
+
+ def test_record_with_two_names(self):
+ """Reading a Bytes record with two names returns a list of both names.
+ """
+ self.assertRecordParsing(
+ ([('name1', ), ('name2', )], 'aaaaa'),
+ "5\nname1\nname2\n\naaaaa")
+
+ def test_record_with_two_part_names(self):
+ """Reading a Bytes record with a two_part name reads both."""
+ self.assertRecordParsing(
+ ([('name1', 'name2')], 'aaaaa'),
+ "5\nname1\x00name2\n\naaaaa")
+
+ def test_invalid_length(self):
+ """If the length-prefix is not a number, parsing raises
+ InvalidRecordError.
+ """
+ parser = self.make_parser_expecting_bytes_record()
+ self.assertRaises(
+ errors.InvalidRecordError, parser.accept_bytes, "not a number\n")
+
+ def test_incomplete_record(self):
+ """If the bytes seen so far don't form a complete record, then there
+ will be nothing returned by read_pending_records.
+ """
+ parser = self.make_parser_expecting_bytes_record()
+ parser.accept_bytes("5\n\nabcd")
+ self.assertEqual([], parser.read_pending_records())
+
+ def test_accept_nothing(self):
+ """The edge case of parsing an empty string causes no error."""
+ parser = self.make_parser_expecting_bytes_record()
+ parser.accept_bytes("")
+
+ def assertInvalidRecord(self, bytes):
+ """Assert that parsing the given bytes will raise an
+ InvalidRecordError.
+ """
+ parser = self.make_parser_expecting_bytes_record()
+ self.assertRaises(
+ errors.InvalidRecordError, parser.accept_bytes, bytes)
+
+ def test_read_invalid_name_whitespace(self):
+ """Names must have no whitespace."""
+ # A name with a space.
+ self.assertInvalidRecord("0\nbad name\n\n")
+
+ # A name with a tab.
+ self.assertInvalidRecord("0\nbad\tname\n\n")
+
+ # A name with a vertical tab.
+ self.assertInvalidRecord("0\nbad\vname\n\n")
+
+ def test_repeated_read_pending_records(self):
+ """read_pending_records will not return the same record twice."""
+ parser = self.make_parser_expecting_bytes_record()
+ parser.accept_bytes("6\n\nabcdef")
+ self.assertEqual([([], 'abcdef')], parser.read_pending_records())
+ self.assertEqual([], parser.read_pending_records())
+
+
More information about the bazaar-commits
mailing list