Rev 2572: (Andrew Bennetts, Aaron Bentley) Add container format as described in doc/developers/container-format.txt in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Jul 3 06:25:00 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2572
revision-id: pqm at pqm.ubuntu.com-20070703052458-wh36exfav0xnj9nf
parent: pqm at pqm.ubuntu.com-20070702183615-qkiquhju4t2grtf9
parent: andrew.bennetts at canonical.com-20070703041219-4zsjgrup4k6sdlzk
committer: Canonical.com Patch Queue Manager<pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-07-03 06:24:58 +0100
message:
  (Andrew Bennetts, Aaron Bentley) Add container format as described in doc/developers/container-format.txt
added:
  bzrlib/pack.py                 container.py-20070607160755-tr8zc26q18rn0jnb-1
  bzrlib/tests/test_pack.py      test_container.py-20070607160755-tr8zc26q18rn0jnb-2
modified:
  bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
  bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
  bzrlib/tests/test_errors.py    test_errors.py-20060210110251-41aba2deddf936a8
  doc/developers/container-format.txt containerformat.txt-20070601074309-7n7w1jiyayud6xdn-1
    ------------------------------------------------------------
    revno: 2506.2.12
    merged: andrew.bennetts at canonical.com-20070703041219-4zsjgrup4k6sdlzk
    parent: andrew.bennetts at canonical.com-20070703040601-62bbp6gt9ivf3vja
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: abentley-container-format
    timestamp: Tue 2007-07-03 14:12:19 +1000
    message:
      Update docstring for Aaron's changes.
    ------------------------------------------------------------
    revno: 2506.2.11
    merged: andrew.bennetts at canonical.com-20070703040601-62bbp6gt9ivf3vja
    parent: andrew.bennetts at canonical.com-20070703040508-11q9cdef4og1qry8
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: abentley-container-format
    timestamp: Tue 2007-07-03 14:06:01 +1000
    message:
      Keep container-format.txt up to date with changes to the code.
    ------------------------------------------------------------
    revno: 2506.2.10
    merged: andrew.bennetts at canonical.com-20070703040508-11q9cdef4og1qry8
    parent: abentley at panoramicfeedback.com-20070628171306-scpsxn9g89cchzz8
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: abentley-container-format
    timestamp: Tue 2007-07-03 14:05:08 +1000
    message:
      Add '(introduced in 0.18)' to pack format string.
    ------------------------------------------------------------
    revno: 2506.2.9
    merged: abentley at panoramicfeedback.com-20070628171306-scpsxn9g89cchzz8
    parent: abentley at panoramicfeedback.com-20070628165006-m7bd56ngqs26rd91
    committer: Aaron Bentley <abentley at panoramicfeedback.com>
    branch nick: container-format
    timestamp: Thu 2007-06-28 13:13:06 -0400
    message:
      Use file-like objects as container input, not callables
    ------------------------------------------------------------
    revno: 2506.2.8
    merged: abentley at panoramicfeedback.com-20070628165006-m7bd56ngqs26rd91
    parent: andrew.bennetts at canonical.com-20070614132802-bas89f67tqq4p3s6
    parent: pqm at pqm.ubuntu.com-20070628082903-b21gad45bimzvmgu
    committer: Aaron Bentley <abentley at panoramicfeedback.com>
    branch nick: container-format
    timestamp: Thu 2007-06-28 12:50:06 -0400
    message:
      Merge bzr.dev
    ------------------------------------------------------------
    revno: 2506.2.7
    merged: andrew.bennetts at canonical.com-20070614132802-bas89f67tqq4p3s6
    parent: andrew.bennetts at canonical.com-20070614055245-rtwk0vgz74fyyimo
    parent: andrew.bennetts at canonical.com-20070614125513-nua0p6bw9cw3jeaq
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: container-format
    timestamp: Thu 2007-06-14 23:28:02 +1000
    message:
      Change read/iter_records to return a callable, add more validation, and
      improve docstrings.
        ------------------------------------------------------------
        revno: 2506.2.6.1.2
        merged: andrew.bennetts at canonical.com-20070614125513-nua0p6bw9cw3jeaq
        parent: andrew.bennetts at canonical.com-20070614112338-6u3900u6nkag66u8
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Thu 2007-06-14 22:55:13 +1000
        message:
          Docstring improvements.
        ------------------------------------------------------------
        revno: 2506.2.6.1.1
        merged: andrew.bennetts at canonical.com-20070614112338-6u3900u6nkag66u8
        parent: andrew.bennetts at canonical.com-20070614055245-rtwk0vgz74fyyimo
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Thu 2007-06-14 21:23:38 +1000
        message:
          Return a callable instead of a str from read, and add more validation.
    ------------------------------------------------------------
    revno: 2506.2.6
    merged: andrew.bennetts at canonical.com-20070614055245-rtwk0vgz74fyyimo
    parent: andrew.bennetts at canonical.com-20070614022816-ne4h4qk0j50x6n26
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: container-format
    timestamp: Thu 2007-06-14 15:52:45 +1000
    message:
      Add validate method to ContainerReader and BytesRecordReader.
    ------------------------------------------------------------
    revno: 2506.2.5
    merged: andrew.bennetts at canonical.com-20070614022816-ne4h4qk0j50x6n26
    parent: andrew.bennetts at canonical.com-20070614015707-hncvkzg0mn4w0w31
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: container-format
    timestamp: Thu 2007-06-14 12:28:16 +1000
    message:
      Update format marker in container-format.txt to be in sync with the code.
    ------------------------------------------------------------
    revno: 2506.2.4
    merged: andrew.bennetts at canonical.com-20070614015707-hncvkzg0mn4w0w31
    parent: andrew.bennetts at canonical.com-20070612015639-z378i21fmcnd5j4x
    parent: andrew.bennetts at canonical.com-20070613105602-1bagfibob1rh21mg
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: container-format
    timestamp: Thu 2007-06-14 11:57:07 +1000
    message:
      Some small improvements to the pack format, and also merge in bzr.dev.
        ------------------------------------------------------------
        revno: 2506.2.3.1.4
        merged: andrew.bennetts at canonical.com-20070613105602-1bagfibob1rh21mg
        parent: andrew.bennetts at canonical.com-20070613105446-ukb9knp9dmy57v74
        parent: pqm at pqm.ubuntu.com-20070613061627-xx5xk6q0oxcy1etm
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Wed 2007-06-13 20:56:02 +1000
        message:
          Merge from bzr.dev.
        ------------------------------------------------------------
        revno: 2506.2.3.1.3
        merged: andrew.bennetts at canonical.com-20070613105446-ukb9knp9dmy57v74
        parent: andrew.bennetts at canonical.com-20070613105312-z94x8g4y5mlg4ukg
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Wed 2007-06-13 20:54:46 +1000
        message:
          Change format marker to use the word 'Bazaar' rather than 'bzr'.
        ------------------------------------------------------------
        revno: 2506.2.3.1.2
        merged: andrew.bennetts at canonical.com-20070613105312-z94x8g4y5mlg4ukg
        parent: andrew.bennetts at canonical.com-20070613075835-sb1o923hdtwnmlrv
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Wed 2007-06-13 20:53:12 +1000
        message:
          Raise InvalidRecordError on invalid names.
        ------------------------------------------------------------
        revno: 2506.2.3.1.1
        merged: andrew.bennetts at canonical.com-20070613075835-sb1o923hdtwnmlrv
        parent: andrew.bennetts at canonical.com-20070612015639-z378i21fmcnd5j4x
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Wed 2007-06-13 17:58:35 +1000
        message:
          Remove duplicate definition of ContainerWriter.
    ------------------------------------------------------------
    revno: 2506.2.3
    merged: andrew.bennetts at canonical.com-20070612015639-z378i21fmcnd5j4x
    parent: andrew.bennetts at canonical.com-20070609034820-t7u540w5pyhvtgn3
    parent: andrew.bennetts at canonical.com-20070611072208-9p73cf5vcu7zh0ys
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: container-format
    timestamp: Tue 2007-06-12 11:56:39 +1000
    message:
      Fix docstring markup, remove obsolete comment.
        ------------------------------------------------------------
        revno: 2506.2.2.1.1
        merged: andrew.bennetts at canonical.com-20070611072208-9p73cf5vcu7zh0ys
        parent: andrew.bennetts at canonical.com-20070609034820-t7u540w5pyhvtgn3
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Mon 2007-06-11 17:22:08 +1000
        message:
          Fix docstring markup, remove obsolete comment.
    ------------------------------------------------------------
    revno: 2506.2.2
    merged: andrew.bennetts at canonical.com-20070609034820-t7u540w5pyhvtgn3
    parent: andrew.bennetts at canonical.com-20070607160934-jfs1wrxxtulso9nw
    parent: andrew.bennetts at canonical.com-20070609034525-j9d7i5dlk6ou97eb
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: container-format
    timestamp: Sat 2007-06-09 13:48:20 +1000
    message:
      More improvements, especially in error handling.
        ------------------------------------------------------------
        revno: 2506.2.1.1.3
        merged: andrew.bennetts at canonical.com-20070609034525-j9d7i5dlk6ou97eb
        parent: andrew.bennetts at canonical.com-20070608064547-vzhyegqx2vl6pni3
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Sat 2007-06-09 13:45:25 +1000
        message:
          Deal with EOF in the middle of a bytes record.
        ------------------------------------------------------------
        revno: 2506.2.1.1.2
        merged: andrew.bennetts at canonical.com-20070608064547-vzhyegqx2vl6pni3
        parent: andrew.bennetts at canonical.com-20070608063359-s5ps81a8i85w7by0
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Fri 2007-06-08 16:45:47 +1000
        message:
          Test docstring tweaks, inspired by looking over the output of jml's testdoc tool.
        ------------------------------------------------------------
        revno: 2506.2.1.1.1
        merged: andrew.bennetts at canonical.com-20070608063359-s5ps81a8i85w7by0
        parent: andrew.bennetts at canonical.com-20070607160934-jfs1wrxxtulso9nw
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: container-format
        timestamp: Fri 2007-06-08 16:33:59 +1000
        message:
          More progress:
          
           * Rename container.py to pack.py
           * Refactor bytes record reading into a separate class for ease of unit testing.
           * Start handling error conditions such as invalid content lengths in byte
             records.
    ------------------------------------------------------------
    revno: 2506.2.1
    merged: andrew.bennetts at canonical.com-20070607160934-jfs1wrxxtulso9nw
    parent: pqm at pqm.ubuntu.com-20070604194535-ihhpf84qp0icoj2t
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: container-format
    timestamp: Fri 2007-06-08 02:09:34 +1000
    message:
      Start implementing container format reading and writing.
=== added file 'bzrlib/pack.py'
--- a/bzrlib/pack.py	1970-01-01 00:00:00 +0000
+++ b/bzrlib/pack.py	2007-07-03 04:12:19 +0000
@@ -0,0 +1,269 @@
+# Copyright (C) 2007 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+"""Container format for Bazaar data.
+
+"Containers" and "records" are described in doc/developers/container-format.txt.
+"""
+
+import re
+
+from bzrlib import errors
+
+
+FORMAT_ONE = "Bazaar pack format 1 (introduced in 0.18)"
+
+
+_whitespace_re = re.compile('[\t\n\x0b\x0c\r ]')
+
+
+def _check_name(name):
+    """Do some basic checking of 'name'.
+    
+    At the moment, this just checks that there are no whitespace characters in a
+    name.
+
+    :raises InvalidRecordError: if name is not valid.
+    :seealso: _check_name_encoding
+    """
+    if _whitespace_re.search(name) is not None:
+        raise errors.InvalidRecordError("%r is not a valid name." % (name,))
+
+
+def _check_name_encoding(name):
+    """Check that 'name' is valid UTF-8.
+    
+    This is separate from _check_name because UTF-8 decoding is relatively
+    expensive, and we usually want to avoid it.
+
+    :raises InvalidRecordError: if name is not valid UTF-8.
+    """
+    try:
+        name.decode('utf-8')
+    except UnicodeDecodeError, e:
+        raise errors.InvalidRecordError(str(e))
+
+
+class ContainerWriter(object):
+    """A class for writing containers."""
+
+    def __init__(self, write_func):
+        """Constructor.
+
+        :param write_func: a callable that will be called when this
+            ContainerWriter needs to write some bytes.
+        """
+        self.write_func = write_func
+
+    def begin(self):
+        """Begin writing a container."""
+        self.write_func(FORMAT_ONE + "\n")
+
+    def end(self):
+        """Finish writing a container."""
+        self.write_func("E")
+
+    def add_bytes_record(self, bytes, names):
+        """Add a Bytes record with the given names."""
+        # Kind marker
+        self.write_func("B")
+        # Length
+        self.write_func(str(len(bytes)) + "\n")
+        # Names
+        for name in names:
+            # Make sure we're writing valid names.  Note that we will leave a
+            # half-written record if a name is bad!
+            _check_name(name)
+            self.write_func(name + "\n")
+        # End of headers
+        self.write_func("\n")
+        # Finally, the contents.
+        self.write_func(bytes)
+
+
+class BaseReader(object):
+
+    def __init__(self, source_file):
+        """Constructor.
+
+        :param source_file: a file-like object with `read` and `readline`
+            methods.
+        """
+        self._source = source_file
+
+    def reader_func(self, length=None):
+        return self._source.read(length)
+
+    def _read_line(self):
+        line = self._source.readline()
+        if not line.endswith('\n'):
+            raise errors.UnexpectedEndOfContainerError()
+        return line.rstrip('\n')
+
+
+class ContainerReader(BaseReader):
+    """A class for reading Bazaar's container format."""
+
+    def iter_records(self):
+        """Iterate over the container, yielding each record as it is read.
+
+        Each yielded record will be a 2-tuple of (names, callable), where names
+        is a ``list`` and bytes is a function that takes one argument,
+        ``max_length``.
+
+        You **must not** call the callable after advancing the interator to the
+        next record.  That is, this code is invalid::
+
+            record_iter = container.iter_records()
+            names1, callable1 = record_iter.next()
+            names2, callable2 = record_iter.next()
+            bytes1 = callable1(None)
+        
+        As it will give incorrect results and invalidate the state of the
+        ContainerReader.
+
+        :raises ContainerError: if any sort of containter corruption is
+            detected, e.g. UnknownContainerFormatError is the format of the
+            container is unrecognised.
+        :seealso: ContainerReader.read
+        """
+        self._read_format()
+        return self._iter_records()
+    
+    def iter_record_objects(self):
+        """Iterate over the container, yielding each record as it is read.
+
+        Each yielded record will be an object with ``read`` and ``validate``
+        methods.  Like with iter_records, it is not safe to use a record object
+        after advancing the iterator to yield next record.
+
+        :raises ContainerError: if any sort of containter corruption is
+            detected, e.g. UnknownContainerFormatError is the format of the
+            container is unrecognised.
+        :seealso: iter_records
+        """
+        self._read_format()
+        return self._iter_record_objects()
+    
+    def _iter_records(self):
+        for record in self._iter_record_objects():
+            yield record.read()
+
+    def _iter_record_objects(self):
+        while True:
+            record_kind = self.reader_func(1)
+            if record_kind == 'B':
+                # Bytes record.
+                reader = BytesRecordReader(self._source)
+                yield reader
+            elif record_kind == 'E':
+                # End marker.  There are no more records.
+                return
+            elif record_kind == '':
+                # End of stream encountered, but no End Marker record seen, so
+                # this container is incomplete.
+                raise errors.UnexpectedEndOfContainerError()
+            else:
+                # Unknown record type.
+                raise errors.UnknownRecordTypeError(record_kind)
+
+    def _read_format(self):
+        format = self._read_line()
+        if format != FORMAT_ONE:
+            raise errors.UnknownContainerFormatError(format)
+
+    def validate(self):
+        """Validate this container and its records.
+
+        Validating consumes the data stream just like iter_records and
+        iter_record_objects, so you cannot call it after
+        iter_records/iter_record_objects.
+
+        :raises ContainerError: if something is invalid.
+        """
+        all_names = set()
+        for record_names, read_bytes in self.iter_records():
+            read_bytes(None)
+            for name in record_names:
+                _check_name_encoding(name)
+                # Check that the name is unique.  Note that Python will refuse
+                # to decode non-shortest forms of UTF-8 encoding, so there is no
+                # risk that the same unicode string has been encoded two
+                # different ways.
+                if name in all_names:
+                    raise errors.DuplicateRecordNameError(name)
+                all_names.add(name)
+        excess_bytes = self.reader_func(1)
+        if excess_bytes != '':
+            raise errors.ContainerHasExcessDataError(excess_bytes)
+
+
+class BytesRecordReader(BaseReader):
+
+    def read(self):
+        """Read this record.
+
+        You can either validate or read a record, you can't do both.
+
+        :returns: A tuple of (names, callable).  The callable can be called
+            repeatedly to obtain the bytes for the record, with a max_length
+            argument.  If max_length is None, returns all the bytes.  Because
+            records can be arbitrarily large, using None is not recommended
+            unless you have reason to believe the content will fit in memory.
+        """
+        # Read the content length.
+        length_line = self._read_line()
+        try:
+            length = int(length_line)
+        except ValueError:
+            raise errors.InvalidRecordError(
+                "%r is not a valid length." % (length_line,))
+        
+        # Read the list of names.
+        names = []
+        while True:
+            name = self._read_line()
+            if name == '':
+                break
+            _check_name(name)
+            names.append(name)
+
+        self._remaining_length = length
+        return names, self._content_reader
+
+    def _content_reader(self, max_length):
+        if max_length is None:
+            length_to_read = self._remaining_length
+        else:
+            length_to_read = min(max_length, self._remaining_length)
+        self._remaining_length -= length_to_read
+        bytes = self.reader_func(length_to_read)
+        if len(bytes) != length_to_read:
+            raise errors.UnexpectedEndOfContainerError()
+        return bytes
+
+    def validate(self):
+        """Validate this record.
+
+        You can either validate or read, you can't do both.
+
+        :raises ContainerError: if this record is invalid.
+        """
+        names, read_bytes = self.read()
+        for name in names:
+            _check_name_encoding(name)
+        read_bytes(None)
+

=== added file 'bzrlib/tests/test_pack.py'
--- a/bzrlib/tests/test_pack.py	1970-01-01 00:00:00 +0000
+++ b/bzrlib/tests/test_pack.py	2007-07-03 04:05:08 +0000
@@ -0,0 +1,377 @@
+# Copyright (C) 2007 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+"""Tests for bzrlib.pack."""
+
+
+from cStringIO import StringIO
+
+from bzrlib import pack, errors, tests
+
+
+class TestContainerWriter(tests.TestCase):
+
+    def test_construct(self):
+        """Test constructing a ContainerWriter.
+        
+        This uses None as the output stream to show that the constructor doesn't
+        try to use the output stream.
+        """
+        writer = pack.ContainerWriter(None)
+
+    def test_begin(self):
+        """The begin() method writes the container format marker line."""
+        output = StringIO()
+        writer = pack.ContainerWriter(output.write)
+        writer.begin()
+        self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\n',
+                         output.getvalue())
+
+    def test_end(self):
+        """The end() method writes an End Marker record."""
+        output = StringIO()
+        writer = pack.ContainerWriter(output.write)
+        writer.begin()
+        writer.end()
+        self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\nE',
+                         output.getvalue())
+
+    def test_add_bytes_record_no_name(self):
+        """Add a bytes record with no name."""
+        output = StringIO()
+        writer = pack.ContainerWriter(output.write)
+        writer.begin()
+        writer.add_bytes_record('abc', names=[])
+        self.assertEqual('Bazaar pack format 1 (introduced in 0.18)\nB3\n\nabc',
+                         output.getvalue())
+
+    def test_add_bytes_record_one_name(self):
+        """Add a bytes record with one name."""
+        output = StringIO()
+        writer = pack.ContainerWriter(output.write)
+        writer.begin()
+        writer.add_bytes_record('abc', names=['name1'])
+        self.assertEqual(
+            'Bazaar pack format 1 (introduced in 0.18)\n'
+            'B3\nname1\n\nabc',
+            output.getvalue())
+
+    def test_add_bytes_record_two_names(self):
+        """Add a bytes record with two names."""
+        output = StringIO()
+        writer = pack.ContainerWriter(output.write)
+        writer.begin()
+        writer.add_bytes_record('abc', names=['name1', 'name2'])
+        self.assertEqual(
+            'Bazaar pack format 1 (introduced in 0.18)\n'
+            'B3\nname1\nname2\n\nabc',
+            output.getvalue())
+
+    def test_add_bytes_record_invalid_name(self):
+        """Adding a Bytes record with a name with whitespace in it raises
+        InvalidRecordError.
+        """
+        output = StringIO()
+        writer = pack.ContainerWriter(output.write)
+        writer.begin()
+        self.assertRaises(
+            errors.InvalidRecordError,
+            writer.add_bytes_record, 'abc', names=['bad name'])
+
+
+class TestContainerReader(tests.TestCase):
+
+    def get_reader_for(self, bytes):
+        stream = StringIO(bytes)
+        reader = pack.ContainerReader(stream)
+        return reader
+
+    def test_construct(self):
+        """Test constructing a ContainerReader.
+        
+        This uses None as the output stream to show that the constructor doesn't
+        try to use the input stream.
+        """
+        reader = pack.ContainerReader(None)
+
+    def test_empty_container(self):
+        """Read an empty container."""
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\nE")
+        self.assertEqual([], list(reader.iter_records()))
+
+    def test_unknown_format(self):
+        """Unrecognised container formats raise UnknownContainerFormatError."""
+        reader = self.get_reader_for("unknown format\n")
+        self.assertRaises(
+            errors.UnknownContainerFormatError, reader.iter_records)
+
+    def test_unexpected_end_of_container(self):
+        """Containers that don't end with an End Marker record should cause
+        UnexpectedEndOfContainerError to be raised.
+        """
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\n")
+        iterator = reader.iter_records()
+        self.assertRaises(
+            errors.UnexpectedEndOfContainerError, iterator.next)
+
+    def test_unknown_record_type(self):
+        """Unknown record types cause UnknownRecordTypeError to be raised."""
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\nX")
+        iterator = reader.iter_records()
+        self.assertRaises(
+            errors.UnknownRecordTypeError, iterator.next)
+
+    def test_container_with_one_unnamed_record(self):
+        """Read a container with one Bytes record.
+        
+        Parsing Bytes records is more thoroughly exercised by
+        TestBytesRecordReader.  This test is here to ensure that
+        ContainerReader's integration with BytesRecordReader is working.
+        """
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\nB5\n\naaaaaE")
+        expected_records = [([], 'aaaaa')]
+        self.assertEqual(
+            expected_records,
+            [(names, read_bytes(None))
+             for (names, read_bytes) in reader.iter_records()])
+
+    def test_validate_empty_container(self):
+        """validate does not raise an error for a container with no records."""
+        reader = self.get_reader_for("Bazaar pack format 1 (introduced in 0.18)\nE")
+        # No exception raised
+        reader.validate()
+
+    def test_validate_non_empty_valid_container(self):
+        """validate does not raise an error for a container with a valid record.
+        """
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\nB3\nname\n\nabcE")
+        # No exception raised
+        reader.validate()
+
+    def test_validate_bad_format(self):
+        """validate raises an error for unrecognised format strings.
+
+        It may raise either UnexpectedEndOfContainerError or
+        UnknownContainerFormatError, depending on exactly what the string is.
+        """
+        inputs = ["", "x", "Bazaar pack format 1 (introduced in 0.18)", "bad\n"]
+        for input in inputs:
+            reader = self.get_reader_for(input)
+            self.assertRaises(
+                (errors.UnexpectedEndOfContainerError,
+                 errors.UnknownContainerFormatError),
+                reader.validate)
+
+    def test_validate_bad_record_marker(self):
+        """validate raises UnknownRecordTypeError for unrecognised record
+        types.
+        """
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\nX")
+        self.assertRaises(errors.UnknownRecordTypeError, reader.validate)
+
+    def test_validate_data_after_end_marker(self):
+        """validate raises ContainerHasExcessDataError if there are any bytes
+        after the end of the container.
+        """
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\nEcrud")
+        self.assertRaises(
+            errors.ContainerHasExcessDataError, reader.validate)
+
+    def test_validate_no_end_marker(self):
+        """validate raises UnexpectedEndOfContainerError if there's no end of
+        container marker, even if the container up to this point has been valid.
+        """
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\n")
+        self.assertRaises(
+            errors.UnexpectedEndOfContainerError, reader.validate)
+
+    def test_validate_duplicate_name(self):
+        """validate raises DuplicateRecordNameError if the same name occurs
+        multiple times in the container.
+        """
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\n"
+            "B0\nname\n\n"
+            "B0\nname\n\n"
+            "E")
+        self.assertRaises(errors.DuplicateRecordNameError, reader.validate)
+
+    def test_validate_undecodeable_name(self):
+        """Names that aren't valid UTF-8 cause validate to fail."""
+        reader = self.get_reader_for(
+            "Bazaar pack format 1 (introduced in 0.18)\nB0\n\xcc\n\nE")
+        self.assertRaises(errors.InvalidRecordError, reader.validate)
+        
+
+class TestBytesRecordReader(tests.TestCase):
+    """Tests for reading and validating Bytes records with BytesRecordReader."""
+
+    def get_reader_for(self, bytes):
+        stream = StringIO(bytes)
+        reader = pack.BytesRecordReader(stream)
+        return reader
+
+    def test_record_with_no_name(self):
+        """Reading a Bytes record with no name returns an empty list of
+        names.
+        """
+        reader = self.get_reader_for("5\n\naaaaa")
+        names, get_bytes = reader.read()
+        self.assertEqual([], names)
+        self.assertEqual('aaaaa', get_bytes(None))
+
+    def test_record_with_one_name(self):
+        """Reading a Bytes record with one name returns a list of just that
+        name.
+        """
+        reader = self.get_reader_for("5\nname1\n\naaaaa")
+        names, get_bytes = reader.read()
+        self.assertEqual(['name1'], names)
+        self.assertEqual('aaaaa', get_bytes(None))
+
+    def test_record_with_two_names(self):
+        """Reading a Bytes record with two names returns a list of both names.
+        """
+        reader = self.get_reader_for("5\nname1\nname2\n\naaaaa")
+        names, get_bytes = reader.read()
+        self.assertEqual(['name1', 'name2'], names)
+        self.assertEqual('aaaaa', get_bytes(None))
+
+    def test_invalid_length(self):
+        """If the length-prefix is not a number, parsing raises
+        InvalidRecordError.
+        """
+        reader = self.get_reader_for("not a number\n")
+        self.assertRaises(errors.InvalidRecordError, reader.read)
+
+    def test_early_eof(self):
+        """Tests for premature EOF occuring during parsing Bytes records with
+        BytesRecordReader.
+        
+        A incomplete container might be interrupted at any point.  The
+        BytesRecordReader needs to cope with the input stream running out no
+        matter where it is in the parsing process.
+
+        In all cases, UnexpectedEndOfContainerError should be raised.
+        """
+        complete_record = "6\nname\n\nabcdef"
+        for count in range(0, len(complete_record)):
+            incomplete_record = complete_record[:count]
+            reader = self.get_reader_for(incomplete_record)
+            # We don't use assertRaises to make diagnosing failures easier
+            # (assertRaises doesn't allow a custom failure message).
+            try:
+                names, read_bytes = reader.read()
+                read_bytes(None)
+            except errors.UnexpectedEndOfContainerError:
+                pass
+            else:
+                self.fail(
+                    "UnexpectedEndOfContainerError not raised when parsing %r"
+                    % (incomplete_record,))
+
+    def test_initial_eof(self):
+        """EOF before any bytes read at all."""
+        reader = self.get_reader_for("")
+        self.assertRaises(errors.UnexpectedEndOfContainerError, reader.read)
+
+    def test_eof_after_length(self):
+        """EOF after reading the length and before reading name(s)."""
+        reader = self.get_reader_for("123\n")
+        self.assertRaises(errors.UnexpectedEndOfContainerError, reader.read)
+
+    def test_eof_during_name(self):
+        """EOF during reading a name."""
+        reader = self.get_reader_for("123\nname")
+        self.assertRaises(errors.UnexpectedEndOfContainerError, reader.read)
+
+    def test_read_invalid_name_whitespace(self):
+        """Names must have no whitespace."""
+        # A name with a space.
+        reader = self.get_reader_for("0\nbad name\n\n")
+        self.assertRaises(errors.InvalidRecordError, reader.read)
+
+        # A name with a tab.
+        reader = self.get_reader_for("0\nbad\tname\n\n")
+        self.assertRaises(errors.InvalidRecordError, reader.read)
+
+        # A name with a vertical tab.
+        reader = self.get_reader_for("0\nbad\vname\n\n")
+        self.assertRaises(errors.InvalidRecordError, reader.read)
+
+    def test_validate_whitespace_in_name(self):
+        """Names must have no whitespace."""
+        reader = self.get_reader_for("0\nbad name\n\n")
+        self.assertRaises(errors.InvalidRecordError, reader.validate)
+
+    def test_validate_interrupted_prelude(self):
+        """EOF during reading a record's prelude causes validate to fail."""
+        reader = self.get_reader_for("")
+        self.assertRaises(
+            errors.UnexpectedEndOfContainerError, reader.validate)
+
+    def test_validate_interrupted_body(self):
+        """EOF during reading a record's body causes validate to fail."""
+        reader = self.get_reader_for("1\n\n")
+        self.assertRaises(
+            errors.UnexpectedEndOfContainerError, reader.validate)
+
+    def test_validate_unparseable_length(self):
+        """An unparseable record length causes validate to fail."""
+        reader = self.get_reader_for("\n\n")
+        self.assertRaises(
+            errors.InvalidRecordError, reader.validate)
+
+    def test_validate_undecodeable_name(self):
+        """Names that aren't valid UTF-8 cause validate to fail."""
+        reader = self.get_reader_for("0\n\xcc\n\n")
+        self.assertRaises(errors.InvalidRecordError, reader.validate)
+
+    def test_read_max_length(self):
+        """If the max_length passed to the callable returned by read is not
+        None, then no more than that many bytes will be read.
+        """
+        reader = self.get_reader_for("6\n\nabcdef")
+        names, get_bytes = reader.read()
+        self.assertEqual('abc', get_bytes(3))
+
+    def test_read_no_max_length(self):
+        """If the max_length passed to the callable returned by read is None,
+        then all the bytes in the record will be read.
+        """
+        reader = self.get_reader_for("6\n\nabcdef")
+        names, get_bytes = reader.read()
+        self.assertEqual('abcdef', get_bytes(None))
+
+    def test_repeated_read_calls(self):
+        """Repeated calls to the callable returned from BytesRecordReader.read
+        will not read beyond the end of the record.
+        """
+        reader = self.get_reader_for("6\n\nabcdefB3\nnext-record\nXXX")
+        names, get_bytes = reader.read()
+        self.assertEqual('abcdef', get_bytes(None))
+        self.assertEqual('', get_bytes(None))
+        self.assertEqual('', get_bytes(99))
+
+

=== modified file 'bzrlib/errors.py'
--- a/bzrlib/errors.py	2007-06-26 08:52:20 +0000
+++ b/bzrlib/errors.py	2007-06-28 16:50:06 +0000
@@ -2150,6 +2150,57 @@
         self.response_tuple = response_tuple
 
 
+class ContainerError(BzrError):
+    """Base class of container errors."""
+
+
+class UnknownContainerFormatError(ContainerError):
+
+    _fmt = "Unrecognised container format: %(container_format)r"
+    
+    def __init__(self, container_format):
+        self.container_format = container_format
+
+
+class UnexpectedEndOfContainerError(ContainerError):
+
+    _fmt = "Unexpected end of container stream"
+
+    internal_error = False
+
+
+class UnknownRecordTypeError(ContainerError):
+
+    _fmt = "Unknown record type: %(record_type)r"
+
+    def __init__(self, record_type):
+        self.record_type = record_type
+
+
+class InvalidRecordError(ContainerError):
+
+    _fmt = "Invalid record: %(reason)s"
+
+    def __init__(self, reason):
+        self.reason = reason
+
+
+class ContainerHasExcessDataError(ContainerError):
+
+    _fmt = "Container has data after end marker: %(excess)r"
+
+    def __init__(self, excess):
+        self.excess = excess
+
+
+class DuplicateRecordNameError(ContainerError):
+
+    _fmt = "Container has multiple records with the same name: \"%(name)s\""
+
+    def __init__(self, name):
+        self.name = name
+
+
 class NoDestinationAddress(BzrError):
 
     _fmt = "Message does not have a destination address."

=== modified file 'bzrlib/tests/__init__.py'
--- a/bzrlib/tests/__init__.py	2007-07-02 05:53:55 +0000
+++ b/bzrlib/tests/__init__.py	2007-07-03 05:24:58 +0000
@@ -2278,6 +2278,7 @@
                    'bzrlib.tests.test_commit_merge',
                    'bzrlib.tests.test_config',
                    'bzrlib.tests.test_conflicts',
+                   'bzrlib.tests.test_pack',
                    'bzrlib.tests.test_counted_lock',
                    'bzrlib.tests.test_decorators',
                    'bzrlib.tests.test_delta',

=== modified file 'bzrlib/tests/test_errors.py'
--- a/bzrlib/tests/test_errors.py	2007-06-26 08:52:20 +0000
+++ b/bzrlib/tests/test_errors.py	2007-06-28 16:50:06 +0000
@@ -273,6 +273,47 @@
             "Could not understand response from smart server: ('not yes',)",
             str(e))
 
+    def test_unknown_container_format(self):
+        """Test the formatting of UnknownContainerFormatError."""
+        e = errors.UnknownContainerFormatError('bad format string')
+        self.assertEqual(
+            "Unrecognised container format: 'bad format string'",
+            str(e))
+
+    def test_unexpected_end_of_container(self):
+        """Test the formatting of UnexpectedEndOfContainerError."""
+        e = errors.UnexpectedEndOfContainerError()
+        self.assertEqual(
+            "Unexpected end of container stream", str(e))
+
+    def test_unknown_record_type(self):
+        """Test the formatting of UnknownRecordTypeError."""
+        e = errors.UnknownRecordTypeError("X")
+        self.assertEqual(
+            "Unknown record type: 'X'",
+            str(e))
+
+    def test_invalid_record(self):
+        """Test the formatting of InvalidRecordError."""
+        e = errors.InvalidRecordError("xxx")
+        self.assertEqual(
+            "Invalid record: xxx",
+            str(e))
+
+    def test_container_has_excess_data(self):
+        """Test the formatting of ContainerHasExcessDataError."""
+        e = errors.ContainerHasExcessDataError("excess bytes")
+        self.assertEqual(
+            "Container has data after end marker: 'excess bytes'",
+            str(e))
+
+    def test_duplicate_record_name_error(self):
+        """Test the formatting of DuplicateRecordNameError."""
+        e = errors.DuplicateRecordNameError(u"n\xe5me".encode('utf-8'))
+        self.assertEqual(
+            "Container has multiple records with the same name: \"n\xc3\xa5me\"",
+            str(e))
+
 
 class PassThroughError(errors.BzrError):
     

=== modified file 'doc/developers/container-format.txt'
--- a/doc/developers/container-format.txt	2007-06-08 02:47:19 +0000
+++ b/doc/developers/container-format.txt	2007-07-03 04:06:01 +0000
@@ -176,7 +176,7 @@
 
 The format is:
 
-  * a **container lead-in**, "``bzr pack format 1\n``",
+  * a **container lead-in**, "``Bazaar pack format 1 (introduced in 0.18)\n``",
   * followed by one or more **records**.
 
 A record is:




More information about the bazaar-commits mailing list