Rev 2981: (robertc) Streaming-friendly container APIs. (Andrew Bennetts) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Nov 13 00:51:44 GMT 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2981
revision-id: pqm at pqm.ubuntu.com-20071113005140-mp4owdlrd1ccnqc9
parent: pqm at pqm.ubuntu.com-20071112212307-eusj64ymto8l9abk
parent: andrew.bennetts at canonical.com-20071110145614-bxpv031fq77cjg5j
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-11-13 00:51:40 +0000
message:
  (robertc) Streaming-friendly container APIs. (Andrew Bennetts)
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/pack.py                 container.py-20070607160755-tr8zc26q18rn0jnb-1
  bzrlib/tests/test_pack.py      test_container.py-20070607160755-tr8zc26q18rn0jnb-2
    ------------------------------------------------------------
    revno: 2916.2.17
    merged: andrew.bennetts at canonical.com-20071110145614-bxpv031fq77cjg5j
    parent: andrew.bennetts at canonical.com-20071110145313-gqkdu436xswrhpcu
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Sat 2007-11-10 09:56:14 -0500
    message:
      Add NEWS entry.
    ------------------------------------------------------------
    revno: 2916.2.16
    merged: andrew.bennetts at canonical.com-20071110145313-gqkdu436xswrhpcu
    parent: andrew.bennetts at canonical.com-20071110144920-dm24jrmdjp9nx856
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Sat 2007-11-10 09:53:13 -0500
    message:
      Undo change to bundle writing; that belongs in a separate branch.
    ------------------------------------------------------------
    revno: 2916.2.15
    merged: andrew.bennetts at canonical.com-20071110144920-dm24jrmdjp9nx856
    parent: andrew.bennetts at canonical.com-20071110144838-mhsih72mackm8vad
    parent: pqm at pqm.ubuntu.com-20071109154145-1yq4oi390uk3z90o
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Sat 2007-11-10 09:49:20 -0500
    message:
      Merge from bzr.dev.
    ------------------------------------------------------------
    revno: 2916.2.14
    merged: andrew.bennetts at canonical.com-20071110144838-mhsih72mackm8vad
    parent: andrew.bennetts at canonical.com-20071029083438-ke1vsv97dvgrvup5
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Sat 2007-11-10 09:48:38 -0500
    message:
      Add a docstring.
    ------------------------------------------------------------
    revno: 2916.2.13
    merged: andrew.bennetts at canonical.com-20071029083438-ke1vsv97dvgrvup5
    parent: andrew.bennetts at canonical.com-20071029061151-s542b9ort6ph7vg1
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Mon 2007-10-29 19:34:38 +1100
    message:
      Improve some docstrings.
    ------------------------------------------------------------
    revno: 2916.2.12
    merged: andrew.bennetts at canonical.com-20071029061151-s542b9ort6ph7vg1
    parent: andrew.bennetts at canonical.com-20071029060352-ts5x6nkb7gocv7ga
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Mon 2007-10-29 17:11:51 +1100
    message:
      Refactor TestContainerWriter to be a little more concise.
    ------------------------------------------------------------
    revno: 2916.2.11
    merged: andrew.bennetts at canonical.com-20071029060352-ts5x6nkb7gocv7ga
    parent: andrew.bennetts at canonical.com-20071025101107-ycrog1cjenpwdye1
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Mon 2007-10-29 17:03:52 +1100
    message:
      Add direct unit tests for ContainerSerialiser.
    ------------------------------------------------------------
    revno: 2916.2.10
    merged: andrew.bennetts at canonical.com-20071025101107-ycrog1cjenpwdye1
    parent: andrew.bennetts at canonical.com-20071025101042-dgoqmkisttgbsrwj
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Thu 2007-10-25 20:11:07 +1000
    message:
      Simpler iter_records_from_file implementation.
    ------------------------------------------------------------
    revno: 2916.2.9
    merged: andrew.bennetts at canonical.com-20071025101042-dgoqmkisttgbsrwj
    parent: andrew.bennetts at canonical.com-20071025094846-3n4h4joh38k9fqat
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Thu 2007-10-25 20:10:42 +1000
    message:
      Turn an EOFError from bz2 decompressor into StopIteration.
    ------------------------------------------------------------
    revno: 2916.2.8
    merged: andrew.bennetts at canonical.com-20071025094846-3n4h4joh38k9fqat
    parent: andrew.bennetts at canonical.com-20071025063825-m1nec9q2h3gp5qv1
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Thu 2007-10-25 19:48:46 +1000
    message:
      Add bzrlib.pack.iter_records_from_file.
    ------------------------------------------------------------
    revno: 2916.2.7
    merged: andrew.bennetts at canonical.com-20071025063825-m1nec9q2h3gp5qv1
    parent: andrew.bennetts at canonical.com-20071022051209-qyq4vnvw447hzxvl
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Thu 2007-10-25 16:38:25 +1000
    message:
      Remove duplicated test case classes, presumably added by a copy-and-paste mistake.
    ------------------------------------------------------------
    revno: 2916.2.6
    merged: andrew.bennetts at canonical.com-20071022051209-qyq4vnvw447hzxvl
    parent: andrew.bennetts at canonical.com-20071022050512-251thgr0i2ntcq3j
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Mon 2007-10-22 15:12:09 +1000
    message:
      Better docstrings.
    ------------------------------------------------------------
    revno: 2916.2.5
    merged: andrew.bennetts at canonical.com-20071022050512-251thgr0i2ntcq3j
    parent: andrew.bennetts at canonical.com-20071022045017-rxuvzs4slk4pbprs
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Mon 2007-10-22 15:05:12 +1000
    message:
      Extract a ContainerSerialiser class from ContainerWriter.
    ------------------------------------------------------------
    revno: 2916.2.4
    merged: andrew.bennetts at canonical.com-20071022045017-rxuvzs4slk4pbprs
    parent: andrew.bennetts at canonical.com-20071021042958-rupaqotpkhp32pe7
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Mon 2007-10-22 14:50:17 +1000
    message:
      Extract a _serialise_byte_records function.
    ------------------------------------------------------------
    revno: 2916.2.3
    merged: andrew.bennetts at canonical.com-20071021042958-rupaqotpkhp32pe7
    parent: andrew.bennetts at canonical.com-20071019084206-5q81fn1u5rsbnwln
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Sun 2007-10-21 14:29:58 +1000
    message:
      Remove debug mutter.
    ------------------------------------------------------------
    revno: 2916.2.2
    merged: andrew.bennetts at canonical.com-20071019084206-5q81fn1u5rsbnwln
    parent: andrew.bennetts at canonical.com-20071019083714-h5k2lcltixga71kx
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Fri 2007-10-19 18:42:06 +1000
    message:
      Add a couple of docstrings to the tests.
    ------------------------------------------------------------
    revno: 2916.2.1
    merged: andrew.bennetts at canonical.com-20071019083714-h5k2lcltixga71kx
    parent: pqm at pqm.ubuntu.com-20071018040514-3hc1k2nj1umg3tig
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: streamable-containers
    timestamp: Fri 2007-10-19 18:37:14 +1000
    message:
      Initial implementation of a 'push' parser for the container format.
=== modified file 'NEWS'
--- a/NEWS	2007-11-09 19:50:36 +0000
+++ b/NEWS	2007-11-13 00:51:40 +0000
@@ -80,6 +80,11 @@
 
   INTERNALS:
 
+   * Added ``ContainerSerialiser`` and ``ContainerPushParser`` to
+     ``bzrlib.pack``.  These classes provide more convenient APIs for generating
+     and parsing containers from streams rather than from files.  (Andrew
+     Bennetts)
+
   TESTING:
 
 

=== modified file 'bzrlib/pack.py'
--- a/bzrlib/pack.py	2007-10-05 05:52:45 +0000
+++ b/bzrlib/pack.py	2007-11-10 14:48:38 +0000
@@ -16,7 +16,8 @@
 
 """Container format for Bazaar data.
 
-"Containers" and "records" are described in doc/developers/container-format.txt.
+"Containers" and "records" are described in
+doc/developers/container-format.txt.
 """
 
 from cStringIO import StringIO
@@ -58,8 +59,54 @@
         raise errors.InvalidRecordError(str(e))
 
 
+class ContainerSerialiser(object):
+    """A helper class for serialising containers.
+    
+    It simply returns bytes from method calls to 'begin', 'end' and
+    'bytes_record'.  You may find ContainerWriter to be a more convenient
+    interface.
+    """
+
+    def begin(self):
+        """Return the bytes to begin a container."""
+        return FORMAT_ONE + "\n"
+
+    def end(self):
+        """Return the bytes to finish a container."""
+        return "E"
+
+    def bytes_record(self, bytes, names):
+        """Return the bytes for a Bytes record with the given name and
+        contents.
+        """
+        # Kind marker
+        byte_sections = ["B"]
+        # Length
+        byte_sections.append(str(len(bytes)) + "\n")
+        # Names
+        for name_tuple in names:
+            # Make sure we're writing valid names.  Note that we will leave a
+            # half-written record if a name is bad!
+            for name in name_tuple:
+                _check_name(name)
+            byte_sections.append('\x00'.join(name_tuple) + "\n")
+        # End of headers
+        byte_sections.append("\n")
+        # Finally, the contents.
+        byte_sections.append(bytes)
+        # XXX: This causes a memory copy of bytes in size, but is usually
+        # faster than two write calls (12 vs 13 seconds to output a gig of
+        # 1k records.) - results may differ on significantly larger records
+        # like .iso's but as they should be rare in any case and thus not
+        # likely to be the common case. The biggest issue is causing extreme
+        # memory pressure in that case. One possibly improvement here is to
+        # check the size of the content before deciding to join here vs call
+        # write twice.
+        return ''.join(byte_sections)
+
+
 class ContainerWriter(object):
-    """A class for writing containers.
+    """A class for writing containers to a file.
 
     :attribute records_written: The number of user records added to the
         container. This does not count the prelude or suffix of the container
@@ -75,10 +122,11 @@
         self._write_func = write_func
         self.current_offset = 0
         self.records_written = 0
+        self._serialiser = ContainerSerialiser()
 
     def begin(self):
         """Begin writing a container."""
-        self.write_func(FORMAT_ONE + "\n")
+        self.write_func(self._serialiser.begin())
 
     def write_func(self, bytes):
         self._write_func(bytes)
@@ -86,7 +134,7 @@
 
     def end(self):
         """Finish writing a container."""
-        self.write_func("E")
+        self.write_func(self._serialiser.end())
 
     def add_bytes_record(self, bytes, names):
         """Add a Bytes record with the given names.
@@ -103,30 +151,8 @@
             and thus are only suitable for use by a ContainerReader.
         """
         current_offset = self.current_offset
-        # Kind marker
-        byte_sections = ["B"]
-        # Length
-        byte_sections.append(str(len(bytes)) + "\n")
-        # Names
-        for name_tuple in names:
-            # Make sure we're writing valid names.  Note that we will leave a
-            # half-written record if a name is bad!
-            for name in name_tuple:
-                _check_name(name)
-            byte_sections.append('\x00'.join(name_tuple) + "\n")
-        # End of headers
-        byte_sections.append("\n")
-        # Finally, the contents.
-        byte_sections.append(bytes)
-        # XXX: This causes a memory copy of bytes in size, but is usually
-        # faster than two write calls (12 vs 13 seconds to output a gig of
-        # 1k records.) - results may differ on significantly larger records
-        # like .iso's but as they should be rare in any case and thus not
-        # likely to be the common case. The biggest issue is causing extreme
-        # memory pressure in that case. One possibly improvement here is to
-        # check the size of the content before deciding to join here vs call
-        # write twice.
-        self.write_func(''.join(byte_sections))
+        serialised_record = self._serialiser.bytes_record(bytes, names)
+        self.write_func(serialised_record)
         self.records_written += 1
         # return a memo of where we wrote data to allow random access.
         return current_offset, self.current_offset - current_offset
@@ -355,3 +381,123 @@
                 _check_name_encoding(name)
         read_bytes(None)
 
+
+class ContainerPushParser(object):
+    """A "push" parser for container format 1.
+
+    It accepts bytes via the ``accept_bytes`` method, and parses them into
+    records which can be retrieved via the ``read_pending_records`` method.
+    """
+
+    def __init__(self):
+        self._buffer = ''
+        self._state_handler = self._state_expecting_format_line
+        self._parsed_records = []
+        self._reset_current_record()
+        self.finished = False
+
+    def _reset_current_record(self):
+        self._current_record_length = None
+        self._current_record_names = []
+
+    def accept_bytes(self, bytes):
+        self._buffer += bytes
+        # Keep iterating the state machine until it stops consuming bytes from
+        # the buffer.
+        last_buffer_length = None
+        cur_buffer_length = len(self._buffer)
+        while cur_buffer_length != last_buffer_length:
+            last_buffer_length = cur_buffer_length
+            self._state_handler()
+            cur_buffer_length = len(self._buffer)
+
+    def read_pending_records(self):
+        records = self._parsed_records
+        self._parsed_records = []
+        return records
+    
+    def _consume_line(self):
+        """Take a line out of the buffer, and return the line.
+
+        If a newline byte is not found in the buffer, the buffer is
+        unchanged and this returns None instead.
+        """
+        newline_pos = self._buffer.find('\n')
+        if newline_pos != -1:
+            line = self._buffer[:newline_pos]
+            self._buffer = self._buffer[newline_pos+1:]
+            return line
+        else:
+            return None
+
+    def _state_expecting_format_line(self):
+        line = self._consume_line()
+        if line is not None:
+            if line != FORMAT_ONE:
+                raise errors.UnknownContainerFormatError(line)
+            self._state_handler = self._state_expecting_record_type
+
+    def _state_expecting_record_type(self):
+        if len(self._buffer) >= 1:
+            record_type = self._buffer[0]
+            self._buffer = self._buffer[1:]
+            if record_type == 'B':
+                self._state_handler = self._state_expecting_length
+            elif record_type == 'E':
+                self.finished = True
+                self._state_handler = self._state_expecting_nothing
+            else:
+                raise errors.UnknownRecordTypeError(record_type)
+
+    def _state_expecting_length(self):
+        line = self._consume_line()
+        if line is not None:
+            try:
+                self._current_record_length = int(line)
+            except ValueError:
+                raise errors.InvalidRecordError(
+                    "%r is not a valid length." % (line,))
+            self._state_handler = self._state_expecting_name
+
+    def _state_expecting_name(self):
+        encoded_name_parts = self._consume_line()
+        if encoded_name_parts == '':
+            self._state_handler = self._state_expecting_body
+        elif encoded_name_parts:
+            name_parts = tuple(encoded_name_parts.split('\x00'))
+            for name_part in name_parts:
+                _check_name(name_part)
+            self._current_record_names.append(name_parts)
+            
+    def _state_expecting_body(self):
+        if len(self._buffer) >= self._current_record_length:
+            body_bytes = self._buffer[:self._current_record_length]
+            self._buffer = self._buffer[self._current_record_length:]
+            record = (self._current_record_names, body_bytes)
+            self._parsed_records.append(record)
+            self._reset_current_record()
+            self._state_handler = self._state_expecting_record_type
+
+    def _state_expecting_nothing(self):
+        pass
+
+    def read_size_hint(self):
+        hint = 16384
+        if self._state_handler == self._state_expecting_body:
+            remaining = self._current_record_length - len(self._buffer)
+            if remaining < 0:
+                remaining = 0
+            return max(hint, remaining)
+        return hint
+
+
+def iter_records_from_file(source_file):
+    parser = ContainerPushParser()
+    while True:
+        bytes = source_file.read(parser.read_size_hint())
+        parser.accept_bytes(bytes)
+        for record in parser.read_pending_records():
+            yield record
+        if parser.finished:
+            break
+

=== modified file 'bzrlib/tests/test_pack.py'
--- a/bzrlib/tests/test_pack.py	2007-08-28 05:17:06 +0000
+++ b/bzrlib/tests/test_pack.py	2007-10-29 08:34:38 +0000
@@ -22,151 +22,181 @@
 from bzrlib import pack, errors, tests
 
 
+class TestContainerSerialiser(tests.TestCase):
+    """Tests for the ContainerSerialiser class."""
+
+    def test_construct(self):
+        """Test constructing a ContainerSerialiser."""
+        pack.ContainerSerialiser()
+
+    def test_begin(self):
+        serialiser = pack.ContainerSerialiser()
+        self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\n',
+                         serialiser.begin())
+
+    def test_end(self):
+        serialiser = pack.ContainerSerialiser()
+        self.assertEqual('E', serialiser.end())
+
+    def test_bytes_record_no_name(self):
+        serialiser = pack.ContainerSerialiser()
+        record = serialiser.bytes_record('bytes', [])
+        self.assertEqual('B5\n\nbytes', record)
+        
+    def test_bytes_record_one_name_with_one_part(self):
+        serialiser = pack.ContainerSerialiser()
+        record = serialiser.bytes_record('bytes', [('name',)])
+        self.assertEqual('B5\nname\n\nbytes', record)
+        
+    def test_bytes_record_one_name_with_two_parts(self):
+        serialiser = pack.ContainerSerialiser()
+        record = serialiser.bytes_record('bytes', [('part1', 'part2')])
+        self.assertEqual('B5\npart1\x00part2\n\nbytes', record)
+        
+    def test_bytes_record_two_names(self):
+        serialiser = pack.ContainerSerialiser()
+        record = serialiser.bytes_record('bytes', [('name1',), ('name2',)])
+        self.assertEqual('B5\nname1\nname2\n\nbytes', record)
+
+    def test_bytes_record_whitespace_in_name_part(self):
+        serialiser = pack.ContainerSerialiser()
+        self.assertRaises(
+            errors.InvalidRecordError,
+            serialiser.bytes_record, 'bytes', [('bad name',)])
+
+
 class TestContainerWriter(tests.TestCase):
 
+    def setUp(self):
+        self.output = StringIO()
+        self.writer = pack.ContainerWriter(self.output.write)
+
+    def assertOutput(self, expected_output):
+        """Assert that the output of self.writer ContainerWriter is equal to
+        expected_output.
+        """
+        self.assertEqual(expected_output, self.output.getvalue())
+
     def test_construct(self):
         """Test constructing a ContainerWriter.
         
-        This uses None as the output stream to show that the constructor doesn't
-        try to use the output stream.
+        This uses None as the output stream to show that the constructor
+        doesn't try to use the output stream.
         """
         writer = pack.ContainerWriter(None)
 
     def test_begin(self):
         """The begin() method writes the container format marker line."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\n',
-                         output.getvalue())
+        self.writer.begin()
+        self.assertOutput('Bazaar pack format 1 (introduced in 0.18)\n')
 
     def test_zero_records_written_after_begin(self):
         """After begin is written, 0 records have been written."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        self.assertEqual(0, writer.records_written)
+        self.writer.begin()
+        self.assertEqual(0, self.writer.records_written)
 
     def test_end(self):
         """The end() method writes an End Marker record."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        writer.end()
-        self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\nE',
-                         output.getvalue())
+        self.writer.begin()
+        self.writer.end()
+        self.assertOutput('Bazaar pack format 1 (introduced in 0.18)\nE')
 
     def test_empty_end_does_not_add_a_record_to_records_written(self):
         """The end() method does not count towards the records written."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        writer.end()
-        self.assertEqual(0, writer.records_written)
+        self.writer.begin()
+        self.writer.end()
+        self.assertEqual(0, self.writer.records_written)
 
     def test_non_empty_end_does_not_add_a_record_to_records_written(self):
         """The end() method does not count towards the records written."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        writer.add_bytes_record('foo', names=[])
-        writer.end()
-        self.assertEqual(1, writer.records_written)
+        self.writer.begin()
+        self.writer.add_bytes_record('foo', names=[])
+        self.writer.end()
+        self.assertEqual(1, self.writer.records_written)
 
     def test_add_bytes_record_no_name(self):
         """Add a bytes record with no name."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        offset, length = writer.add_bytes_record('abc', names=[])
+        self.writer.begin()
+        offset, length = self.writer.add_bytes_record('abc', names=[])
         self.assertEqual((42, 7), (offset, length))
-        self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\nB3\n\nabc',
-                         output.getvalue())
+        self.assertOutput(
+            'Bazaar pack format 1 (introduced in 0.18)\nB3\n\nabc')
 
     def test_add_bytes_record_one_name(self):
         """Add a bytes record with one name."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        offset, length = writer.add_bytes_record('abc', names=[('name1', )])
+        self.writer.begin()
+        offset, length = self.writer.add_bytes_record(
+            'abc', names=[('name1', )])
         self.assertEqual((42, 13), (offset, length))
-        self.assertEqual(
-            'Bazaar pack format 1 (introduced in 0.18)\n'
-            'B3\nname1\n\nabc',
-            output.getvalue())
-
-    def test_add_bytes_record_two_names(self):
-        """Add a bytes record with two names."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        offset, length = writer.add_bytes_record('abc', names=[('name1', ), ('name2', )])
-        self.assertEqual((42, 19), (offset, length))
-        self.assertEqual(
-            'Bazaar pack format 1 (introduced in 0.18)\n'
-            'B3\nname1\nname2\n\nabc',
-            output.getvalue())
-
-    def test_add_bytes_record_two_names(self):
-        """Add a bytes record with two names."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        offset, length = writer.add_bytes_record('abc', names=[('name1', ), ('name2', )])
-        self.assertEqual((42, 19), (offset, length))
-        self.assertEqual(
-            'Bazaar pack format 1 (introduced in 0.18)\n'
-            'B3\nname1\nname2\n\nabc',
-            output.getvalue())
+        self.assertOutput(
+            'Bazaar pack format 1 (introduced in 0.18)\n'
+            'B3\nname1\n\nabc')
+
+    def test_add_bytes_record_two_names(self):
+        """Add a bytes record with two names."""
+        self.writer.begin()
+        offset, length = self.writer.add_bytes_record(
+            'abc', names=[('name1', ), ('name2', )])
+        self.assertEqual((42, 19), (offset, length))
+        self.assertOutput(
+            'Bazaar pack format 1 (introduced in 0.18)\n'
+            'B3\nname1\nname2\n\nabc')
+
+    def test_add_bytes_record_two_names(self):
+        """Add a bytes record with two names."""
+        self.writer.begin()
+        offset, length = self.writer.add_bytes_record(
+            'abc', names=[('name1', ), ('name2', )])
+        self.assertEqual((42, 19), (offset, length))
+        self.assertOutput(
+            'Bazaar pack format 1 (introduced in 0.18)\n'
+            'B3\nname1\nname2\n\nabc')
 
     def test_add_bytes_record_two_element_name(self):
         """Add a bytes record with a two-element name."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        offset, length = writer.add_bytes_record('abc', names=[('name1', 'name2')])
+        self.writer.begin()
+        offset, length = self.writer.add_bytes_record(
+            'abc', names=[('name1', 'name2')])
         self.assertEqual((42, 19), (offset, length))
-        self.assertEqual(
+        self.assertOutput(
             'Bazaar pack format 1 (introduced in 0.18)\n'
-            'B3\nname1\x00name2\n\nabc',
-            output.getvalue())
+            'B3\nname1\x00name2\n\nabc')
 
     def test_add_second_bytes_record_gets_higher_offset(self):
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        writer.add_bytes_record('abc', names=[])
-        offset, length = writer.add_bytes_record('abc', names=[])
+        self.writer.begin()
+        self.writer.add_bytes_record('abc', names=[])
+        offset, length = self.writer.add_bytes_record('abc', names=[])
         self.assertEqual((49, 7), (offset, length))
-        self.assertEqual(
+        self.assertOutput(
             'Bazaar pack format 1 (introduced in 0.18)\n'
             'B3\n\nabc'
-            'B3\n\nabc',
-            output.getvalue())
+            'B3\n\nabc')
 
     def test_add_bytes_record_invalid_name(self):
         """Adding a Bytes record with a name with whitespace in it raises
         InvalidRecordError.
         """
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
+        self.writer.begin()
         self.assertRaises(
             errors.InvalidRecordError,
-            writer.add_bytes_record, 'abc', names=[('bad name', )])
+            self.writer.add_bytes_record, 'abc', names=[('bad name', )])
 
     def test_add_bytes_records_add_to_records_written(self):
         """Adding a Bytes record increments the records_written counter."""
-        output = StringIO()
-        writer = pack.ContainerWriter(output.write)
-        writer.begin()
-        writer.add_bytes_record('foo', names=[])
-        self.assertEqual(1, writer.records_written)
-        writer.add_bytes_record('foo', names=[])
-        self.assertEqual(2, writer.records_written)
+        self.writer.begin()
+        self.writer.add_bytes_record('foo', names=[])
+        self.assertEqual(1, self.writer.records_written)
+        self.writer.add_bytes_record('foo', names=[])
+        self.assertEqual(2, self.writer.records_written)
 
 
 class TestContainerReader(tests.TestCase):
+    """Tests for the ContainerReader.
+
+    The ContainerReader reads format 1 containers, so these tests explicitly
+    test how it reacts to format 1 data.  If a new version of the format is
+    added, then separate tests for that format should be added.
+    """
 
     def get_reader_for(self, bytes):
         stream = StringIO(bytes)
@@ -299,7 +329,13 @@
         
 
 class TestBytesRecordReader(tests.TestCase):
-    """Tests for reading and validating Bytes records with BytesRecordReader."""
+    """Tests for reading and validating Bytes records with
+    BytesRecordReader.
+    
+    Like TestContainerReader, this explicitly tests the reading of format 1
+    data.  If a new version of the format is added, then a separate set of
+    tests for reading that format should be added.
+    """
 
     def get_reader_for(self, bytes):
         stream = StringIO(bytes)
@@ -523,3 +559,137 @@
         results.append(f.readline())
         results.append(f.read(4))
         self.assertEqual(['0', '\n', '2\n4\n'], results)
+
+
+class PushParserTestCase(tests.TestCase):
+    """Base class for TestCases involving ContainerPushParser."""
+
+    def make_parser_expecting_record_type(self):
+        parser = pack.ContainerPushParser()
+        parser.accept_bytes("Bazaar pack format 1 (introduced in 0.18)\n")
+        return parser
+
+    def make_parser_expecting_bytes_record(self):
+        parser = pack.ContainerPushParser()
+        parser.accept_bytes("Bazaar pack format 1 (introduced in 0.18)\nB")
+        return parser
+
+    def assertRecordParsing(self, expected_record, bytes):
+        """Assert that 'bytes' is parsed as a given bytes record.
+
+        :param expected_record: A tuple of (names, bytes).
+        """
+        parser = self.make_parser_expecting_bytes_record()
+        parser.accept_bytes(bytes)
+        parsed_records = parser.read_pending_records()
+        self.assertEqual([expected_record], parsed_records)
+
+        
+class TestContainerPushParser(PushParserTestCase):
+    """Tests for ContainerPushParser.
+    
+    The ContainerPushParser reads format 1 containers, so these tests
+    explicitly test how it reacts to format 1 data.  If a new version of the
+    format is added, then separate tests for that format should be added.
+    """
+
+    def test_construct(self):
+        """ContainerPushParser can be constructed."""
+        pack.ContainerPushParser()
+
+    def test_multiple_records_at_once(self):
+        """If multiple records worth of data are fed to the parser in one
+        string, the parser will correctly parse all the records.
+
+        (A naive implementation might stop after parsing the first record.)
+        """
+        parser = self.make_parser_expecting_record_type()
+        parser.accept_bytes("B5\nname1\n\nbody1B5\nname2\n\nbody2")
+        self.assertEqual(
+            [([('name1',)], 'body1'), ([('name2',)], 'body2')],
+            parser.read_pending_records())
+
+
+class TestContainerPushParserBytesParsing(PushParserTestCase):
+    """Tests for reading Bytes records with ContainerPushParser.
+    
+    The ContainerPushParser reads format 1 containers, so these tests
+    explicitly test how it reacts to format 1 data.  If a new version of the
+    format is added, then separate tests for that format should be added.
+    """
+
+    def test_record_with_no_name(self):
+        """Reading a Bytes record with no name returns an empty list of
+        names.
+        """
+        self.assertRecordParsing(([], 'aaaaa'), "5\n\naaaaa")
+
+    def test_record_with_one_name(self):
+        """Reading a Bytes record with one name returns a list of just that
+        name.
+        """
+        self.assertRecordParsing(
+            ([('name1', )], 'aaaaa'),
+            "5\nname1\n\naaaaa")
+
+    def test_record_with_two_names(self):
+        """Reading a Bytes record with two names returns a list of both names.
+        """
+        self.assertRecordParsing(
+            ([('name1', ), ('name2', )], 'aaaaa'),
+            "5\nname1\nname2\n\naaaaa")
+
+    def test_record_with_two_part_names(self):
+        """Reading a Bytes record with a two_part name reads both."""
+        self.assertRecordParsing(
+            ([('name1', 'name2')], 'aaaaa'),
+            "5\nname1\x00name2\n\naaaaa")
+
+    def test_invalid_length(self):
+        """If the length-prefix is not a number, parsing raises
+        InvalidRecordError.
+        """
+        parser = self.make_parser_expecting_bytes_record()
+        self.assertRaises(
+            errors.InvalidRecordError, parser.accept_bytes, "not a number\n")
+
+    def test_incomplete_record(self):
+        """If the bytes seen so far don't form a complete record, then there
+        will be nothing returned by read_pending_records.
+        """
+        parser = self.make_parser_expecting_bytes_record()
+        parser.accept_bytes("5\n\nabcd")
+        self.assertEqual([], parser.read_pending_records())
+
+    def test_accept_nothing(self):
+        """The edge case of parsing an empty string causes no error."""
+        parser = self.make_parser_expecting_bytes_record()
+        parser.accept_bytes("")
+
+    def assertInvalidRecord(self, bytes):
+        """Assert that parsing the given bytes will raise an
+        InvalidRecordError.
+        """
+        parser = self.make_parser_expecting_bytes_record()
+        self.assertRaises(
+            errors.InvalidRecordError, parser.accept_bytes, bytes)
+
+    def test_read_invalid_name_whitespace(self):
+        """Names must have no whitespace."""
+        # A name with a space.
+        self.assertInvalidRecord("0\nbad name\n\n")
+
+        # A name with a tab.
+        self.assertInvalidRecord("0\nbad\tname\n\n")
+
+        # A name with a vertical tab.
+        self.assertInvalidRecord("0\nbad\vname\n\n")
+
+    def test_repeated_read_pending_records(self):
+        """read_pending_records will not return the same record twice."""
+        parser = self.make_parser_expecting_bytes_record()
+        parser.accept_bytes("6\n\nabcdef")
+        self.assertEqual([([], 'abcdef')], parser.read_pending_records())
+        self.assertEqual([], parser.read_pending_records())
+
+




More information about the bazaar-commits mailing list