Rev 3895: (jam) Add ContentFactory.get_bytes_as('chunked') and in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Thu Dec 11 20:23:06 GMT 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3895
revision-id: pqm at pqm.ubuntu.com-20081211202300-6dz1vo3phfsc23pj
parent: pqm at pqm.ubuntu.com-20081211174647-l45s6xsw669ovgsa
parent: john at arbash-meinel.com-20081211193706-7qz4e5f9a8c5w4b1
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2008-12-11 20:23:00 +0000
message:
  (jam) Add ContentFactory.get_bytes_as('chunked') and
  	osutils.chunks_to_lines()
added:
  bzrlib/_chunks_to_lines_py.py  _chunks_to_lines_py.-20081211024848-6uc3mtuje8j14l60-1
  bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
  bzrlib/tests/test__chunks_to_lines.py test__chunks_to_line-20081211024848-6uc3mtuje8j14l60-2
modified:
  .bzrignore                     bzrignore-20050311232317-81f7b71efa2db11a
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
  bzrlib/merge.py                merge.py-20050513021216-953b65a438527106
  bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
  bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
  bzrlib/tests/test_osutils.py   test_osutils.py-20051201224856-e48ee24c12182989
  bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
  bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
  bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
  bzrlib/weave.py                knit.py-20050627021749-759c29984154256b
  setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
    ------------------------------------------------------------
    revno: 3890.2.18
    revision-id: john at arbash-meinel.com-20081211193706-7qz4e5f9a8c5w4b1
    parent: john at arbash-meinel.com-20081211193101-q0utq7jeh79vpmgr
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 13:37:06 -0600
    message:
      Implement osutils.split_lines() in terms of chunks_to_lines if possible.
      
      chunks_to_lines([fulltext]) is about 2x faster than the original split_lines implementation.
    modified:
      bzrlib/_chunks_to_lines_py.py  _chunks_to_lines_py.-20081211024848-6uc3mtuje8j14l60-1
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
    ------------------------------------------------------------
    revno: 3890.2.17
    revision-id: john at arbash-meinel.com-20081211193101-q0utq7jeh79vpmgr
    parent: john at arbash-meinel.com-20081211182616-l9m9rjnea3bebaor
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 13:31:01 -0600
    message:
      Add a few more corner cases, some suggested by Robert.
    modified:
      bzrlib/tests/test__chunks_to_lines.py test__chunks_to_line-20081211024848-6uc3mtuje8j14l60-2
    ------------------------------------------------------------
    revno: 3890.2.16
    revision-id: john at arbash-meinel.com-20081211182616-l9m9rjnea3bebaor
    parent: john at arbash-meinel.com-20081211182023-sr6hi6owbbzozhkn
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 12:26:16 -0600
    message:
      If we split into 2 loops, we get 440us for already lines, and the
      same time when it is not.
      The only downside is that it requires looping over the same data twice.
    modified:
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
    ------------------------------------------------------------
    revno: 3890.2.15
    revision-id: john at arbash-meinel.com-20081211182023-sr6hi6owbbzozhkn
    parent: john at arbash-meinel.com-20081211175903-gtuvyewwr1eehauq
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 12:20:23 -0600
    message:
      Update to do a single iteration over the chunks.
      
      This costs 600us versus 430us for the case where the object is
      already a list of lines. However it is only 1.2ms rather than 3ms
      when everything is in a single buffer.
      
      The biggest advantage is that 'chunks' *could* be an iterator,
      rather than requiring it to already have all the results.
    modified:
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
      bzrlib/tests/test__chunks_to_lines.py test__chunks_to_line-20081211024848-6uc3mtuje8j14l60-2
    ------------------------------------------------------------
    revno: 3890.2.14
    revision-id: john at arbash-meinel.com-20081211175903-gtuvyewwr1eehauq
    parent: john at arbash-meinel.com-20081211175431-s89ujzp4w4l51x34
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 11:59:03 -0600
    message:
      Restore correctness.
    modified:
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
    ------------------------------------------------------------
    revno: 3890.2.13
    revision-id: john at arbash-meinel.com-20081211175431-s89ujzp4w4l51x34
    parent: john at arbash-meinel.com-20081211174407-6sz5ooqz40m30xc2
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 11:54:31 -0600
    message:
      Add a NEWS entry.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 3890.2.12
    revision-id: john at arbash-meinel.com-20081211174407-6sz5ooqz40m30xc2
    parent: john at arbash-meinel.com-20081211174330-31to8tzq6k4ewii4
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 11:44:07 -0600
    message:
      Remove the extra comment, it probably isn't useful to most people.
    modified:
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
    ------------------------------------------------------------
    revno: 3890.2.11
    revision-id: john at arbash-meinel.com-20081211174330-31to8tzq6k4ewii4
    parent: john at arbash-meinel.com-20081211170336-70oi6rnsgkyh3z2o
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 11:43:30 -0600
    message:
      A bit more tweaking of the pyrex version. Shave off another 10% by
      using PyString_CheckExact.
    modified:
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
      bzrlib/tests/test__chunks_to_lines.py test__chunks_to_line-20081211024848-6uc3mtuje8j14l60-2
    ------------------------------------------------------------
    revno: 3890.2.10
    revision-id: john at arbash-meinel.com-20081211170336-70oi6rnsgkyh3z2o
    parent: john at arbash-meinel.com-20081211031852-cmjpdf2ufno0okui
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Thu 2008-12-11 11:03:36 -0600
    message:
      Change the python implementation to a friendlier implementation.
      
      It is only a little bit slower, because we still avoid function calls.
      Redo the Pyrex version for clarity as well. May need to revisit as it might be
      a little bit slower.
    modified:
      bzrlib/_chunks_to_lines_py.py  _chunks_to_lines_py.-20081211024848-6uc3mtuje8j14l60-1
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
    ------------------------------------------------------------
    revno: 3890.2.9
    revision-id: john at arbash-meinel.com-20081211031852-cmjpdf2ufno0okui
    parent: john at arbash-meinel.com-20081211030803-gctunob7zsten3qg
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 21:18:52 -0600
    message:
      Start using osutils.chunks_as_lines rather than osutils.split_lines.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/merge.py                merge.py-20050513021216-953b65a438527106
      bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
      bzrlib/weave.py                knit.py-20050627021749-759c29984154256b
    ------------------------------------------------------------
    revno: 3890.2.8
    revision-id: john at arbash-meinel.com-20081211030803-gctunob7zsten3qg
    parent: john at arbash-meinel.com-20081211021859-3ds8cwdqiq387t83
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 21:08:03 -0600
    message:
      Move everything into properly parameterized tests.
      
      Also add tests that we preserve the object when it is already lines.
      
      The compiled form takes 450us on a 7.6k line file (NEWS).
      So for common cases, we should have virtually no overhead.
    added:
      bzrlib/_chunks_to_lines_py.py  _chunks_to_lines_py.-20081211024848-6uc3mtuje8j14l60-1
      bzrlib/tests/test__chunks_to_lines.py test__chunks_to_line-20081211024848-6uc3mtuje8j14l60-2
    modified:
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
      bzrlib/tests/test_osutils.py   test_osutils.py-20051201224856-e48ee24c12182989
    ------------------------------------------------------------
    revno: 3890.2.7
    revision-id: john at arbash-meinel.com-20081211021859-3ds8cwdqiq387t83
    parent: john at arbash-meinel.com-20081211020207-rrgdcyqc344zo5q1
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 20:18:59 -0600
    message:
      A Pyrex extension is about 5x faster than the fastest python code I could write.
      
      Seems worth having after all.
    added:
      bzrlib/_chunks_to_lines_pyx.pyx _chunks_to_lines_pyx-20081211021736-op7n8vrxgrd8snfi-1
    modified:
      .bzrignore                     bzrignore-20050311232317-81f7b71efa2db11a
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
      setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
    ------------------------------------------------------------
    revno: 3890.2.6
    revision-id: john at arbash-meinel.com-20081211020207-rrgdcyqc344zo5q1
    parent: john at arbash-meinel.com-20081211011419-vqtdjgpa04woqvm4
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 20:02:07 -0600
    message:
      Change name to 'chunks_to_lines', and find an optimized form.
      
      It is a little bit ugly, but it is faster than join & split, and means
      we get to leave the strings untouched.
    modified:
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
      bzrlib/tests/test_osutils.py   test_osutils.py-20051201224856-e48ee24c12182989
    ------------------------------------------------------------
    revno: 3890.2.5
    revision-id: john at arbash-meinel.com-20081211011419-vqtdjgpa04woqvm4
    parent: john at arbash-meinel.com-20081211011038-osioaxd7moquxxmy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 19:14:19 -0600
    message:
      More tests for edge cases.
    modified:
      bzrlib/tests/test_osutils.py   test_osutils.py-20051201224856-e48ee24c12182989
    ------------------------------------------------------------
    revno: 3890.2.4
    revision-id: john at arbash-meinel.com-20081211011038-osioaxd7moquxxmy
    parent: john at arbash-meinel.com-20081211010104-3tcii2strejk5252
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 19:10:38 -0600
    message:
      Add a new function that can convert 'chunks' format to a 'lines' format.
    modified:
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
      bzrlib/tests/test_osutils.py   test_osutils.py-20051201224856-e48ee24c12182989
    ------------------------------------------------------------
    revno: 3890.2.3
    revision-id: john at arbash-meinel.com-20081211010104-3tcii2strejk5252
    parent: john at arbash-meinel.com-20081211005616-szoqqeabcyahy39u
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 19:01:04 -0600
    message:
      Use the 'chunked' interface to keep memory consumption minimal during revision_trees()
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
    ------------------------------------------------------------
    revno: 3890.2.2
    revision-id: john at arbash-meinel.com-20081211005616-szoqqeabcyahy39u
    parent: john at arbash-meinel.com-20081211005436-a8bn72zw43b1vd9r
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 18:56:16 -0600
    message:
      Change the signature to report the storage kind as 'chunked'
    modified:
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
    ------------------------------------------------------------
    revno: 3890.2.1
    revision-id: john at arbash-meinel.com-20081211005436-a8bn72zw43b1vd9r
    parent: pqm at pqm.ubuntu.com-20081210082822-li6ku9s3k63kjrpr
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: get_record_stream_chunked
    timestamp: Wed 2008-12-10 18:54:36 -0600
    message:
      Start working on a ChunkedContentFactory.
      
      This allows get_bytes_as('chunked') for both FulltextContentFactory,
      and for ChunkedContentFactory, as it is a trivial conversion to
      go between the two styles.
      We will also want to special case when converting 'chunked' into
      'lines'. But that is for future work.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
      bzrlib/weave.py                knit.py-20050627021749-759c29984154256b
=== modified file '.bzrignore'
--- a/.bzrignore	2008-09-23 23:28:27 +0000
+++ b/.bzrignore	2008-12-11 02:18:59 +0000
@@ -39,6 +39,7 @@
 doc/**/*.html
 doc/developers/performance.png
 bzrlib/_btree_serializer_c.c
+bzrlib/_chunks_to_lines_pyx.c
 bzrlib/_dirstate_helpers_c.c
 bzrlib/_knit_load_data_c.c
 bzrlib/_readdir_pyx.c

=== modified file 'NEWS'
--- a/NEWS	2008-12-11 03:07:27 +0000
+++ b/NEWS	2008-12-11 20:23:00 +0000
@@ -84,6 +84,15 @@
       advantage of pycurl is that it checks ssl certificates.)
       (John Arbash Meinel)
 
+    * ``VersionedFiles.get_record_stream()`` can now return objects with a
+      storage_kind of ``chunked``. This is a collection (list/tuple) of
+      strings. You can use ``osutils.chunks_to_lines()`` to turn them into
+      guaranteed 'lines' or you can use ``''.join(chunks)`` to turn it
+      into a fulltext. This allows for some very good memory savings when
+      asking for many texts that share ancestry, as the individual chunks
+      can be shared between versions of the file. (John Arbash Meinel)
+
+
 
 bzr 1.10 2008-12-05
 -------------------

=== added file 'bzrlib/_chunks_to_lines_py.py'
--- a/bzrlib/_chunks_to_lines_py.py	1970-01-01 00:00:00 +0000
+++ b/bzrlib/_chunks_to_lines_py.py	2008-12-11 19:37:06 +0000
@@ -0,0 +1,57 @@
+# Copyright (C) 2008 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+"""The python implementation of chunks_to_lines"""
+
+
+def chunks_to_lines(chunks):
+    """Re-split chunks into simple lines.
+
+    Each entry in the result should contain a single newline at the end. Except
+    for the last entry which may not have a final newline. If chunks is already
+    a simple list of lines, we return it directly.
+
+    :param chunks: An list/tuple of strings. If chunks is already a list of
+        lines, then we will return it as-is.
+    :return: A list of strings.
+    """
+    # Optimize for a very common case when chunks are already lines
+    last_no_newline = False
+    for chunk in chunks:
+        if last_no_newline:
+            # Only the last chunk is allowed to not have a trailing newline
+            # Getting here means the last chunk didn't have a newline, and we
+            # have a chunk following it
+            break
+        if not chunk:
+            # Empty strings are never valid lines
+            break
+        elif '\n' in chunk[:-1]:
+            # This chunk has an extra '\n', so we will have to split it
+            break
+        elif chunk[-1] != '\n':
+            # This chunk does not have a trailing newline
+            last_no_newline = True
+    else:
+        # All of the lines (but possibly the last) have a single newline at the
+        # end of the string.
+        # For the last one, we allow it to not have a trailing newline, but it
+        # is not allowed to be an empty string.
+        return chunks
+
+    # These aren't simple lines, just join and split again.
+    from bzrlib import osutils
+    return osutils._split_lines(''.join(chunks))

=== added file 'bzrlib/_chunks_to_lines_pyx.pyx'
--- a/bzrlib/_chunks_to_lines_pyx.pyx	1970-01-01 00:00:00 +0000
+++ b/bzrlib/_chunks_to_lines_pyx.pyx	2008-12-11 18:26:16 +0000
@@ -0,0 +1,130 @@
+# Copyright (C) 2008 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+
+"""Pyrex extensions for converting chunks to lines."""
+
+#python2.4 support
+cdef extern from "python-compat.h":
+    pass
+
+cdef extern from "stdlib.h":
+    ctypedef unsigned size_t
+
+cdef extern from "Python.h":
+    ctypedef int Py_ssize_t # Required for older pyrex versions
+    ctypedef struct PyObject:
+        pass
+    int PyList_Append(object lst, object item) except -1
+
+    int PyString_CheckExact(object p)
+    char *PyString_AS_STRING(object p)
+    Py_ssize_t PyString_GET_SIZE(object p)
+    object PyString_FromStringAndSize(char *c_str, Py_ssize_t len)
+
+cdef extern from "string.h":
+    void *memchr(void *s, int c, size_t n)
+
+
+def chunks_to_lines(chunks):
+    """Re-split chunks into simple lines.
+
+    Each entry in the result should contain a single newline at the end. Except
+    for the last entry which may not have a final newline. If chunks is already
+    a simple list of lines, we return it directly.
+
+    :param chunks: An list/tuple of strings. If chunks is already a list of
+        lines, then we will return it as-is.
+    :return: A list of strings.
+    """
+    cdef char *c_str
+    cdef char *newline
+    cdef char *c_last
+    cdef Py_ssize_t the_len
+    cdef int last_no_newline
+
+    # Check to see if the chunks are already lines
+    last_no_newline = 0
+    for chunk in chunks:
+        if last_no_newline:
+            # We have a chunk which followed a chunk without a newline, so this
+            # is not a simple list of lines.
+            break
+        # Switching from PyString_AsStringAndSize to PyString_CheckExact and
+        # then the macros GET_SIZE and AS_STRING saved us 40us / 470us.
+        # It seems PyString_AsStringAndSize can actually trigger a conversion,
+        # which we don't want anyway.
+        if not PyString_CheckExact(chunk):
+            raise TypeError('chunk is not a string')
+        the_len = PyString_GET_SIZE(chunk)
+        if the_len == 0:
+            # An empty string is never a valid line
+            break
+        c_str = PyString_AS_STRING(chunk)
+        c_last = c_str + the_len - 1
+        newline = <char *>memchr(c_str, c'\n', the_len)
+        if newline != c_last:
+            if newline == NULL:
+                # Missing a newline. Only valid as the last line
+                last_no_newline = 1
+            else:
+                # There is a newline in the middle, we must resplit
+                break
+    else:
+        # Everything was already a list of lines
+        return chunks
+
+    # We know we need to create a new list of lines
+    lines = []
+    tail = None # Any remainder from the previous chunk
+    for chunk in chunks:
+        if tail is not None:
+            chunk = tail + chunk
+            tail = None
+        if not PyString_CheckExact(chunk):
+            raise TypeError('chunk is not a string')
+        the_len = PyString_GET_SIZE(chunk)
+        if the_len == 0:
+            # An empty string is never a valid line, and we don't need to
+            # append anything
+            continue
+        c_str = PyString_AS_STRING(chunk)
+        c_last = c_str + the_len - 1
+        newline = <char *>memchr(c_str, c'\n', the_len)
+        if newline == c_last:
+            # A simple line
+            PyList_Append(lines, chunk)
+        elif newline == NULL:
+            # A chunk without a newline, if this is the last entry, then we
+            # allow it
+            tail = chunk
+        else:
+            # We have a newline in the middle, loop until we've consumed all
+            # lines
+            while newline != NULL:
+                line = PyString_FromStringAndSize(c_str, newline - c_str + 1)
+                PyList_Append(lines, line)
+                c_str = newline + 1
+                if c_str > c_last: # We are done
+                    break
+                the_len = c_last - c_str + 1
+                newline = <char *>memchr(c_str, c'\n', the_len)
+                if newline == NULL:
+                    tail = PyString_FromStringAndSize(c_str, the_len)
+                    break
+    if tail is not None:
+        PyList_Append(lines, tail)
+    return lines

=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2008-12-05 15:34:02 +0000
+++ b/bzrlib/knit.py	2008-12-11 03:18:52 +0000
@@ -110,7 +110,7 @@
     adapter_registry,
     ConstantMapper,
     ContentFactory,
-    FulltextContentFactory,
+    ChunkedContentFactory,
     VersionedFile,
     VersionedFiles,
     )
@@ -196,7 +196,8 @@
             [compression_parent], 'unordered', True).next()
         if basis_entry.storage_kind == 'absent':
             raise errors.RevisionNotPresent(compression_parent, self._basis_vf)
-        basis_lines = split_lines(basis_entry.get_bytes_as('fulltext'))
+        basis_chunks = basis_entry.get_bytes_as('chunked')
+        basis_lines = osutils.chunks_to_lines(basis_chunks)
         # Manually apply the delta because we have one annotated content and
         # one plain.
         basis_content = PlainKnitContent(basis_lines, compression_parent)
@@ -229,7 +230,8 @@
             [compression_parent], 'unordered', True).next()
         if basis_entry.storage_kind == 'absent':
             raise errors.RevisionNotPresent(compression_parent, self._basis_vf)
-        basis_lines = split_lines(basis_entry.get_bytes_as('fulltext'))
+        basis_chunks = basis_entry.get_bytes_as('chunked')
+        basis_lines = osutils.chunks_to_lines(basis_chunks)
         basis_content = PlainKnitContent(basis_lines, compression_parent)
         # Manually apply the delta because we have one annotated content and
         # one plain.
@@ -276,11 +278,13 @@
     def get_bytes_as(self, storage_kind):
         if storage_kind == self.storage_kind:
             return self._raw_record
-        if storage_kind == 'fulltext' and self._knit is not None:
-            return self._knit.get_text(self.key[0])
-        else:
-            raise errors.UnavailableRepresentation(self.key, storage_kind,
-                self.storage_kind)
+        if self._knit is not None:
+            if storage_kind == 'chunked':
+                return self._knit.get_lines(self.key[0])
+            elif storage_kind == 'fulltext':
+                return self._knit.get_text(self.key[0])
+        raise errors.UnavailableRepresentation(self.key, storage_kind,
+            self.storage_kind)
 
 
 class KnitContent(object):
@@ -1020,7 +1024,7 @@
                 if record.storage_kind == 'absent':
                     continue
                 missing_keys.remove(record.key)
-                lines = split_lines(record.get_bytes_as('fulltext'))
+                lines = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
                 text_map[record.key] = lines
                 content_map[record.key] = PlainKnitContent(lines, record.key)
                 if record.key in keys:
@@ -1288,9 +1292,8 @@
                 text_map, _ = self._get_content_maps(keys, non_local)
                 for key in keys:
                     lines = text_map.pop(key)
-                    text = ''.join(lines)
-                    yield FulltextContentFactory(key, global_map[key], None,
-                                                 text)
+                    yield ChunkedContentFactory(key, global_map[key], None,
+                                                lines)
         else:
             for source, keys in source_keys:
                 if source is parent_maps[0]:
@@ -1443,6 +1446,9 @@
                         buffered = True
                 if not buffered:
                     self._index.add_records([index_entry])
+            elif record.storage_kind == 'chunked':
+                self.add_lines(record.key, parents,
+                    osutils.chunks_to_lines(record.get_bytes_as('chunked')))
             elif record.storage_kind == 'fulltext':
                 self.add_lines(record.key, parents,
                     split_lines(record.get_bytes_as('fulltext')))
@@ -2952,7 +2958,7 @@
         reannotate = annotate.reannotate
         for record in self._knit.get_record_stream(keys, 'topological', True):
             key = record.key
-            fulltext = split_lines(record.get_bytes_as('fulltext'))
+            fulltext = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
             parents = parent_map[key]
             if parents is not None:
                 parent_lines = [parent_cache[parent] for parent in parent_map[key]]

=== modified file 'bzrlib/merge.py'
--- a/bzrlib/merge.py	2008-10-10 11:55:03 +0000
+++ b/bzrlib/merge.py	2008-12-11 03:18:52 +0000
@@ -1579,7 +1579,7 @@
 
     def get_lines(self, revisions):
         """Get lines for revisions from the backing VersionedFiles.
-        
+
         :raises RevisionNotPresent: on absent texts.
         """
         keys = [(self._key_prefix + (rev,)) for rev in revisions]
@@ -1587,8 +1587,8 @@
         for record in self.vf.get_record_stream(keys, 'unordered', True):
             if record.storage_kind == 'absent':
                 raise errors.RevisionNotPresent(record.key, self.vf)
-            result[record.key[-1]] = osutils.split_lines(
-                record.get_bytes_as('fulltext'))
+            result[record.key[-1]] = osutils.chunks_to_lines(
+                record.get_bytes_as('chunked'))
         return result
 
     def plan_merge(self):

=== modified file 'bzrlib/osutils.py'
--- a/bzrlib/osutils.py	2008-10-17 03:49:08 +0000
+++ b/bzrlib/osutils.py	2008-12-11 19:37:06 +0000
@@ -812,6 +812,7 @@
             rps.append(f)
     return rps
 
+
 def joinpath(p):
     for f in p:
         if (f == '..') or (f is None) or (f == ''):
@@ -819,8 +820,28 @@
     return pathjoin(*p)
 
 
+try:
+    from bzrlib._chunks_to_lines_pyx import chunks_to_lines
+except ImportError:
+    from bzrlib._chunks_to_lines_py import chunks_to_lines
+
+
 def split_lines(s):
     """Split s into lines, but without removing the newline characters."""
+    # Trivially convert a fulltext into a 'chunked' representation, and let
+    # chunks_to_lines do the heavy lifting.
+    if isinstance(s, str):
+        # chunks_to_lines only supports 8-bit strings
+        return chunks_to_lines([s])
+    else:
+        return _split_lines(s)
+
+
+def _split_lines(s):
+    """Split s into lines, but without removing the newline characters.
+
+    This supports Unicode or plain string objects.
+    """
     lines = s.split('\n')
     result = [line + '\n' for line in lines[:-1]]
     if lines[-1]:

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2008-12-10 04:34:21 +0000
+++ b/bzrlib/repository.py	2008-12-11 01:01:04 +0000
@@ -1680,14 +1680,15 @@
     def _iter_inventory_xmls(self, revision_ids):
         keys = [(revision_id,) for revision_id in revision_ids]
         stream = self.inventories.get_record_stream(keys, 'unordered', True)
-        texts = {}
+        text_chunks = {}
         for record in stream:
             if record.storage_kind != 'absent':
-                texts[record.key] = record.get_bytes_as('fulltext')
+                text_chunks[record.key] = record.get_bytes_as('chunked')
             else:
                 raise errors.NoSuchRevision(self, record.key)
         for key in keys:
-            yield texts[key], key[-1]
+            chunks = text_chunks.pop(key)
+            yield ''.join(chunks), key[-1]
 
     def deserialise_inventory(self, revision_id, xml):
         """Transform the xml into an inventory object. 

=== modified file 'bzrlib/tests/__init__.py'
--- a/bzrlib/tests/__init__.py	2008-12-09 21:35:49 +0000
+++ b/bzrlib/tests/__init__.py	2008-12-11 03:08:03 +0000
@@ -2788,6 +2788,7 @@
                    'bzrlib.tests.test_bzrdir',
                    'bzrlib.tests.test_cache_utf8',
                    'bzrlib.tests.test_chunk_writer',
+                   'bzrlib.tests.test__chunks_to_lines',
                    'bzrlib.tests.test_commands',
                    'bzrlib.tests.test_commit',
                    'bzrlib.tests.test_commit_merge',

=== added file 'bzrlib/tests/test__chunks_to_lines.py'
--- a/bzrlib/tests/test__chunks_to_lines.py	1970-01-01 00:00:00 +0000
+++ b/bzrlib/tests/test__chunks_to_lines.py	2008-12-11 19:31:01 +0000
@@ -0,0 +1,128 @@
+# Copyright (C) 2008 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+
+"""Tests for chunks_to_lines."""
+
+from bzrlib import tests
+
+
+def load_tests(standard_tests, module, loader):
+    # parameterize all tests in this module
+    suite = loader.suiteClass()
+    applier = tests.TestScenarioApplier()
+    import bzrlib._chunks_to_lines_py as py_module
+    applier.scenarios = [('python', {'module': py_module})]
+    if CompiledChunksToLinesFeature.available():
+        import bzrlib._chunks_to_lines_pyx as c_module
+        applier.scenarios.append(('C', {'module': c_module}))
+    else:
+        # the compiled module isn't available, so we add a failing test
+        class FailWithoutFeature(tests.TestCase):
+            def test_fail(self):
+                self.requireFeature(CompiledChunksToLinesFeature)
+        suite.addTest(loader.loadTestsFromTestCase(FailWithoutFeature))
+    tests.adapt_tests(standard_tests, applier, suite)
+    return suite
+
+
+class _CompiledChunksToLinesFeature(tests.Feature):
+
+    def _probe(self):
+        try:
+            import bzrlib._chunks_to_lines_pyx
+        except ImportError:
+            return False
+        return True
+
+    def feature_name(self):
+        return 'bzrlib._chunks_to_lines_pyx'
+
+CompiledChunksToLinesFeature = _CompiledChunksToLinesFeature()
+
+
+class TestChunksToLines(tests.TestCase):
+
+    module = None # Filled in by test parameterization
+
+    def assertChunksToLines(self, lines, chunks, alreadly_lines=False):
+        result = self.module.chunks_to_lines(chunks)
+        self.assertEqual(lines, result)
+        if alreadly_lines:
+            self.assertIs(chunks, result)
+
+    def test_fulltext_chunk_to_lines(self):
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz\n'],
+                                 ['foo\nbar\r\nba\rz\n'])
+        self.assertChunksToLines(['foobarbaz\n'], ['foobarbaz\n'],
+                                 alreadly_lines=True)
+        self.assertChunksToLines(['foo\n', 'bar\n', '\n', 'baz\n', '\n', '\n'],
+                                 ['foo\nbar\n\nbaz\n\n\n'])
+        self.assertChunksToLines(['foobarbaz'], ['foobarbaz'],
+                                 alreadly_lines=True)
+        self.assertChunksToLines(['foobarbaz'], ['foo', 'bar', 'baz'])
+
+    def test_newlines(self):
+        self.assertChunksToLines(['\n'], ['\n'], alreadly_lines=True)
+        self.assertChunksToLines(['\n'], ['', '\n', ''])
+        self.assertChunksToLines(['\n'], ['\n', ''])
+        self.assertChunksToLines(['\n'], ['', '\n'])
+        self.assertChunksToLines(['\n', '\n', '\n'], ['\n\n\n'])
+        self.assertChunksToLines(['\n', '\n', '\n'], ['\n', '\n', '\n'],
+                                 alreadly_lines=True)
+
+    def test_lines_to_lines(self):
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz\n'],
+                                 ['foo\n', 'bar\r\n', 'ba\rz\n'],
+                                 alreadly_lines=True)
+
+    def test_no_final_newline(self):
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz'],
+                                 ['foo\nbar\r\nba\rz'])
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz'],
+                                 ['foo\n', 'bar\r\n', 'ba\rz'],
+                                 alreadly_lines=True)
+        self.assertChunksToLines(('foo\n', 'bar\r\n', 'ba\rz'),
+                                 ('foo\n', 'bar\r\n', 'ba\rz'),
+                                 alreadly_lines=True)
+        self.assertChunksToLines([], [], alreadly_lines=True)
+        self.assertChunksToLines(['foobarbaz'], ['foobarbaz'],
+                                 alreadly_lines=True)
+        self.assertChunksToLines([], [''])
+
+    def test_mixed(self):
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz'],
+                                 ['foo\n', 'bar\r\nba\r', 'z'])
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz'],
+                                 ['foo\nb', 'a', 'r\r\nba\r', 'z'])
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz'],
+                                 ['foo\nbar\r\nba', '\r', 'z'])
+
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz'],
+                                 ['foo\n', '', 'bar\r\nba', '\r', 'z'])
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz\n'],
+                                 ['foo\n', 'bar\r\n', 'ba\rz\n', ''])
+        self.assertChunksToLines(['foo\n', 'bar\r\n', 'ba\rz\n'],
+                                 ['foo\n', 'bar', '\r\n', 'ba\rz\n'])
+
+    def test_not_lines(self):
+        # We should raise a TypeError, not crash
+        self.assertRaises(TypeError, self.module.chunks_to_lines,
+                          object())
+        self.assertRaises(TypeError, self.module.chunks_to_lines,
+                          [object()])
+        self.assertRaises(TypeError, self.module.chunks_to_lines,
+                          ['foo', object()])

=== modified file 'bzrlib/tests/test_osutils.py'
--- a/bzrlib/tests/test_osutils.py	2008-10-01 07:56:03 +0000
+++ b/bzrlib/tests/test_osutils.py	2008-12-11 03:08:03 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2005, 2006, 2007 Canonical Ltd
+# Copyright (C) 2005, 2006, 2007, 2008 Canonical Ltd
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -756,6 +756,23 @@
         self.assertEndsWith(osutils._mac_getcwd(), u'B\xe5gfors')
 
 
+class TestChunksToLines(TestCase):
+
+    def test_smoketest(self):
+        self.assertEqual(['foo\n', 'bar\n', 'baz\n'],
+                         osutils.chunks_to_lines(['foo\nbar', '\nbaz\n']))
+        self.assertEqual(['foo\n', 'bar\n', 'baz\n'],
+                         osutils.chunks_to_lines(['foo\n', 'bar\n', 'baz\n']))
+
+    def test_is_compiled(self):
+        from bzrlib.tests.test__chunks_to_lines import CompiledChunksToLinesFeature
+        if CompiledChunksToLinesFeature:
+            from bzrlib._chunks_to_lines_pyx import chunks_to_lines
+        else:
+            from bzrlib._chunks_to_lines_py import chunks_to_lines
+        self.assertIs(chunks_to_lines, osutils.chunks_to_lines)
+
+
 class TestSplitLines(TestCase):
 
     def test_split_unicode(self):

=== modified file 'bzrlib/tests/test_versionedfile.py'
--- a/bzrlib/tests/test_versionedfile.py	2008-12-03 21:05:01 +0000
+++ b/bzrlib/tests/test_versionedfile.py	2008-12-11 00:56:16 +0000
@@ -1558,8 +1558,9 @@
         """Assert that storage_kind is a valid storage_kind."""
         self.assertSubset([storage_kind],
             ['mpdiff', 'knit-annotated-ft', 'knit-annotated-delta',
-             'knit-ft', 'knit-delta', 'fulltext', 'knit-annotated-ft-gz',
-             'knit-annotated-delta-gz', 'knit-ft-gz', 'knit-delta-gz'])
+             'knit-ft', 'knit-delta', 'chunked', 'fulltext',
+             'knit-annotated-ft-gz', 'knit-annotated-delta-gz', 'knit-ft-gz',
+             'knit-delta-gz'])
 
     def capture_stream(self, f, entries, on_seen, parents):
         """Capture a stream for testing."""
@@ -1636,9 +1637,11 @@
                 [None, files.get_sha1s([factory.key])[factory.key]])
             self.assertEqual(parent_map[factory.key], factory.parents)
             # self.assertEqual(files.get_text(factory.key),
-            self.assertIsInstance(factory.get_bytes_as('fulltext'), str)
-            self.assertIsInstance(factory.get_bytes_as(factory.storage_kind),
-                str)
+            ft_bytes = factory.get_bytes_as('fulltext')
+            self.assertIsInstance(ft_bytes, str)
+            chunked_bytes = factory.get_bytes_as('chunked')
+            self.assertEqualDiff(ft_bytes, ''.join(chunked_bytes))
+
         self.assertStreamOrder(sort_order, seen, keys)
 
     def assertStreamOrder(self, sort_order, seen, keys):
@@ -2210,8 +2213,9 @@
         self._lines["A"] = ["FOO", "BAR"]
         it = self.texts.get_record_stream([("A",)], "unordered", True)
         record = it.next()
-        self.assertEquals("fulltext", record.storage_kind)
+        self.assertEquals("chunked", record.storage_kind)
         self.assertEquals("FOOBAR", record.get_bytes_as("fulltext"))
+        self.assertEquals(["FOO", "BAR"], record.get_bytes_as("chunked"))
 
     def test_get_record_stream_absent(self):
         it = self.texts.get_record_stream([("A",)], "unordered", True)

=== modified file 'bzrlib/transform.py'
--- a/bzrlib/transform.py	2008-10-28 10:31:32 +0000
+++ b/bzrlib/transform.py	2008-12-11 03:18:52 +0000
@@ -1177,7 +1177,7 @@
             if kind == 'file':
                 cur_file = open(self._limbo_name(trans_id), 'rb')
                 try:
-                    lines = osutils.split_lines(cur_file.read())
+                    lines = osutils.chunks_to_lines(cur_file.readlines())
                 finally:
                     cur_file.close()
                 parents = self._get_parents_lines(trans_id)

=== modified file 'bzrlib/versionedfile.py'
--- a/bzrlib/versionedfile.py	2008-12-03 21:05:01 +0000
+++ b/bzrlib/versionedfile.py	2008-12-11 03:18:52 +0000
@@ -59,6 +59,8 @@
     'bzrlib.knit', 'FTAnnotatedToUnannotated')
 adapter_registry.register_lazy(('knit-annotated-ft-gz', 'fulltext'),
     'bzrlib.knit', 'FTAnnotatedToFullText')
+# adapter_registry.register_lazy(('knit-annotated-ft-gz', 'chunked'),
+#     'bzrlib.knit', 'FTAnnotatedToChunked')
 
 
 class ContentFactory(object):
@@ -84,12 +86,46 @@
         self.parents = None
 
 
+class ChunkedContentFactory(ContentFactory):
+    """Static data content factory.
+
+    This takes a 'chunked' list of strings. The only requirement on 'chunked' is
+    that ''.join(lines) becomes a valid fulltext. A tuple of a single string
+    satisfies this, as does a list of lines.
+
+    :ivar sha1: None, or the sha1 of the content fulltext.
+    :ivar storage_kind: The native storage kind of this factory. Always
+        'chunked'
+    :ivar key: The key of this content. Each key is a tuple with a single
+        string in it.
+    :ivar parents: A tuple of parent keys for self.key. If the object has
+        no parent information, None (as opposed to () for an empty list of
+        parents).
+     """
+
+    def __init__(self, key, parents, sha1, chunks):
+        """Create a ContentFactory."""
+        self.sha1 = sha1
+        self.storage_kind = 'chunked'
+        self.key = key
+        self.parents = parents
+        self._chunks = chunks
+
+    def get_bytes_as(self, storage_kind):
+        if storage_kind == 'chunked':
+            return self._chunks
+        elif storage_kind == 'fulltext':
+            return ''.join(self._chunks)
+        raise errors.UnavailableRepresentation(self.key, storage_kind,
+            self.storage_kind)
+
+
 class FulltextContentFactory(ContentFactory):
     """Static data content factory.
 
     This takes a fulltext when created and just returns that during
     get_bytes_as('fulltext').
-    
+
     :ivar sha1: None, or the sha1 of the content fulltext.
     :ivar storage_kind: The native storage kind of this factory. Always
         'fulltext'.
@@ -111,6 +147,8 @@
     def get_bytes_as(self, storage_kind):
         if storage_kind == self.storage_kind:
             return self._text
+        elif storage_kind == 'chunked':
+            return (self._text,)
         raise errors.UnavailableRepresentation(self.key, storage_kind,
             self.storage_kind)
 
@@ -804,12 +842,12 @@
                                   if not mpvf.has_version(p))
         # It seems likely that adding all the present parents as fulltexts can
         # easily exhaust memory.
-        split_lines = osutils.split_lines
+        chunks_to_lines = osutils.chunks_to_lines
         for record in self.get_record_stream(needed_parents, 'unordered',
             True):
             if record.storage_kind == 'absent':
                 continue
-            mpvf.add_version(split_lines(record.get_bytes_as('fulltext')),
+            mpvf.add_version(chunks_to_lines(record.get_bytes_as('chunked')),
                 record.key, [])
         for (key, parent_keys, expected_sha1, mpdiff), lines in\
             zip(records, mpvf.get_line_list(versions)):
@@ -940,9 +978,9 @@
         ghosts = maybe_ghosts - set(self.get_parent_map(maybe_ghosts))
         knit_keys.difference_update(ghosts)
         lines = {}
-        split_lines = osutils.split_lines
+        chunks_to_lines = osutils.chunks_to_lines
         for record in self.get_record_stream(knit_keys, 'topological', True):
-            lines[record.key] = split_lines(record.get_bytes_as('fulltext'))
+            lines[record.key] = chunks_to_lines(record.get_bytes_as('chunked'))
             # line_block_dict = {}
             # for parent, blocks in record.extract_line_blocks():
             #   line_blocks[parent] = blocks
@@ -1251,8 +1289,7 @@
                 lines = self._lines[key]
                 parents = self._parents[key]
                 pending.remove(key)
-                yield FulltextContentFactory(key, parents, None,
-                    ''.join(lines))
+                yield ChunkedContentFactory(key, parents, None, lines)
         for versionedfile in self.fallback_versionedfiles:
             for record in versionedfile.get_record_stream(
                 pending, 'unordered', True):
@@ -1422,9 +1459,9 @@
             if lines is not None:
                 if not isinstance(lines, list):
                     raise AssertionError
-                yield FulltextContentFactory((k,), None, 
+                yield ChunkedContentFactory((k,), None,
                         sha1=osutils.sha_strings(lines),
-                        text=''.join(lines))
+                        chunks=lines)
             else:
                 yield AbsentContentFactory((k,))
 

=== modified file 'bzrlib/weave.py'
--- a/bzrlib/weave.py	2008-10-01 05:40:45 +0000
+++ b/bzrlib/weave.py	2008-12-11 03:18:52 +0000
@@ -79,6 +79,8 @@
 from bzrlib import tsort
 """)
 from bzrlib import (
+    errors,
+    osutils,
     progress,
     )
 from bzrlib.errors import (WeaveError, WeaveFormatError, WeaveParentMismatch,
@@ -88,7 +90,6 @@
         WeaveRevisionAlreadyPresent,
         WeaveRevisionNotPresent,
         )
-import bzrlib.errors as errors
 from bzrlib.osutils import dirname, sha, sha_strings, split_lines
 import bzrlib.patiencediff
 from bzrlib.revision import NULL_REVISION
@@ -122,6 +123,8 @@
     def get_bytes_as(self, storage_kind):
         if storage_kind == 'fulltext':
             return self._weave.get_text(self.key[-1])
+        elif storage_kind == 'chunked':
+            return self._weave.get_lines(self.key[-1])
         else:
             raise UnavailableRepresentation(self.key, storage_kind, 'fulltext')
 
@@ -357,9 +360,10 @@
                 raise RevisionNotPresent([record.key[0]], self)
             # adapt to non-tuple interface
             parents = [parent[0] for parent in record.parents]
-            if record.storage_kind == 'fulltext':
+            if (record.storage_kind == 'fulltext'
+                or record.storage_kind == 'chunked'):
                 self.add_lines(record.key[0], parents,
-                    split_lines(record.get_bytes_as('fulltext')))
+                    osutils.chunks_to_lines(record.get_bytes_as('chunked')))
             else:
                 adapter_key = record.storage_kind, 'fulltext'
                 try:

=== modified file 'setup.py'
--- a/setup.py	2008-10-16 03:58:42 +0000
+++ b/setup.py	2008-12-11 02:18:59 +0000
@@ -258,6 +258,7 @@
 
 
 add_pyrex_extension('bzrlib._btree_serializer_c')
+add_pyrex_extension('bzrlib._chunks_to_lines_pyx')
 add_pyrex_extension('bzrlib._knit_load_data_c')
 if sys.platform == 'win32':
     add_pyrex_extension('bzrlib._dirstate_helpers_c',




More information about the bazaar-commits mailing list