Rev 3475: (mbp) #234748 fix problems in final newline on Knit add_lines and in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Thu Jun 5 07:35:10 BST 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3475
revision-id:pqm at pqm.ubuntu.com-20080605063459-2lk0v0sayzfqsbqw
parent: pqm at pqm.ubuntu.com-20080605040505-i9kqxg2fps2qjdi0
parent: mbp at sourcefrog.net-20080605055331-6s2tzzkh1m6bya3v
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2008-06-05 07:34:59 +0100
message:
  (mbp) #234748 fix problems in final newline on Knit add_lines and
  	get_lines
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/check.py                check.py-20050309040759-f3a679400c06bcc1
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
  bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
  bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
    ------------------------------------------------------------
    revno: 3468.2.7
    revision-id:mbp at sourcefrog.net-20080605055331-6s2tzzkh1m6bya3v
    parent: mbp at sourcefrog.net-20080604074717-r3quvpdo7d0d1zrh
    parent: pqm at pqm.ubuntu.com-20080605040505-i9kqxg2fps2qjdi0
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: 234748-integrated
    timestamp: Thu 2008-06-05 15:53:31 +1000
    message:
      merge news
    added:
      bzrlib/tests/blackbox/test_alias.py test_alias.py-20080425112253-fbt0yz1c1834jriz-1
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/branch.py               branch.py-20050309040759-e4baf4e0d046576e
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/config.py               config.py-20051011043216-070c74f4e9e338e8
      bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/blackbox/__init__.py __init__.py-20051128053524-eba30d8255e08dc3
      bzrlib/tests/blackbox/test_uncommit.py test_uncommit.py-20051027212835-84944b63adae51be
      bzrlib/tests/http_server.py    httpserver.py-20061012142527-m1yxdj1xazsf8d7s-1
      bzrlib/tests/test_config.py    testconfig.py-20051011041908-742d0c15d8d8c8eb
      bzrlib/tests/test_transport_implementations.py test_transport_implementations.py-20051227111451-f97c5c7d5c49fce7
      bzrlib/tests/test_workingtree_4.py test_workingtree_4.p-20070223025758-531n3tznl3zacv2o-1
      bzrlib/tests/workingtree_implementations/test_parents.py test_set_parents.py-20060807231740-yicmnlci1mj8smu1-1
      bzrlib/workingtree.py          workingtree.py-20050511021032-29b6ec0a681e02e3
      bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
    ------------------------------------------------------------
    revno: 3468.2.6
    revision-id:mbp at sourcefrog.net-20080604074717-r3quvpdo7d0d1zrh
    parent: mbp at sourcefrog.net-20080604074337-th4z2id6fej914zs
    parent: robertc at robertcollins.net-20080530060052-31grufsbxx24vj0j
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: 234748-integrated
    timestamp: Wed 2008-06-04 17:47:17 +1000
    message:
      merge fix from robert for Knit.add_lines with matching blocks
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
        ------------------------------------------------------------
        revno: 3460.2.1
        revision-id:robertc at robertcollins.net-20080530060052-31grufsbxx24vj0j
        parent: pqm at pqm.ubuntu.com-20080529210000-bycgfufmrqq63tki
        committer: Robert Collins <robertc at robertcollins.net>
        branch nick: mpdiff
        timestamp: Fri 2008-05-30 16:00:52 +1000
        message:
           * Inserting a bundle which changes the contents of a file with no trailing
             end of line, causing a knit snapshot in a 'knits' repository will no longer
             cause KnitCorrupt. (Robert Collins)
          
           * Knit record serialisation is now stricter on what it will accept, to
             guard against potential internal bugs, or broken input. (Robert Collins)
        modified:
          NEWS                           NEWS-20050323055033-4e00b5db738777ff
          bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
          bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
    ------------------------------------------------------------
    revno: 3468.2.5
    revision-id:mbp at sourcefrog.net-20080604074337-th4z2id6fej914zs
    parent: mbp at sourcefrog.net-20080604061534-irtkvhd1cq9s788m
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: 234748-retry
    timestamp: Wed 2008-06-04 17:43:37 +1000
    message:
      Correct comment and remove overbroad except block
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
    ------------------------------------------------------------
    revno: 3468.2.4
    revision-id:mbp at sourcefrog.net-20080604061534-irtkvhd1cq9s788m
    parent: mbp at sourcefrog.net-20080604060838-yvgss0nv30glu8q8
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: 234748-retry
    timestamp: Wed 2008-06-04 16:15:34 +1000
    message:
      Test and fix #234748 problems in trailing newline diffs
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
    ------------------------------------------------------------
    revno: 3468.2.3
    revision-id:mbp at sourcefrog.net-20080604060838-yvgss0nv30glu8q8
    parent: mbp at sourcefrog.net-20080604055916-ptaq7yt13iu7i8ft
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: 234748-retry
    timestamp: Wed 2008-06-04 16:08:38 +1000
    message:
      doc
    modified:
      bzrlib/check.py                check.py-20050309040759-f3a679400c06bcc1
    ------------------------------------------------------------
    revno: 3468.2.2
    revision-id:mbp at sourcefrog.net-20080604055916-ptaq7yt13iu7i8ft
    parent: mbp at sourcefrog.net-20080604054429-irx74bpa0j2ofub0
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: 234748-retry
    timestamp: Wed 2008-06-04 15:59:16 +1000
    message:
      Better message re final newlines from assertEqualDiff
    modified:
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
    ------------------------------------------------------------
    revno: 3468.2.1
    revision-id:mbp at sourcefrog.net-20080604054429-irx74bpa0j2ofub0
    parent: pqm at pqm.ubuntu.com-20080603072242-omtkkk586pc5k4d4
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: 234748-retry
    timestamp: Wed 2008-06-04 15:44:29 +1000
    message:
      Add _KnitAccess repr and remove dead import
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
=== modified file 'NEWS'
--- a/NEWS	2008-06-05 03:39:46 +0000
+++ b/NEWS	2008-06-05 05:53:31 +0000
@@ -8,6 +8,17 @@
 IN DEVELOPMENT
 --------------
 
+  BUG FIXES:
+
+    * Avoid KnitCorrupt error extracting inventories from some repositories.
+      (The data is not corrupt; an internal check is detecting a problem
+      reading from the repository.)
+      (Martin Pool, Andrew Bennetts, Robert Collins, #234748)
+
+    * Inserting a bundle which changes the contents of a file with no trailing
+      end of line, causing a knit snapshot in a 'knits' repository will no longer
+      cause KnitCorrupt. (Robert Collins)
+
   IMPROVEMENTS:
 
     * Added the 'alias' command to set/unset and display aliases. (Tim Penhey)
@@ -30,6 +41,11 @@
       record the newest. (If you merge a descendent, it will replace its
       ancestor). (John Arbash Meinel, #235407)
 
+  INTERNALS:
+
+    * Knit record serialisation is now stricter on what it will accept, to
+      guard against potential internal bugs, or broken input. (Robert Collins)
+
 
 bzr 1.6beta1 2008-06-02
 -----------------------

=== modified file 'bzrlib/check.py'
--- a/bzrlib/check.py	2007-11-27 23:39:22 +0000
+++ b/bzrlib/check.py	2008-06-04 06:08:38 +0000
@@ -203,6 +203,10 @@
         files_in_revisions = {}
         revisions_of_files = {}
         weave_checker = self.repository._get_versioned_file_checker()
+        # XXX: for pack repositories, this iteration seems a bit redundant
+        # with iteration also done within the versioned file checker.  also
+        # note that the inventory/revision/signature data is not included in
+        # this enumeration -- mbp 20080528
         for i, weave_id in enumerate(weave_ids):
             self.progress.update('checking versionedfile', i, n_weaves)
             w = self.repository.weave_store.get_weave(weave_id,

=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2008-05-12 02:40:40 +0000
+++ b/bzrlib/knit.py	2008-06-04 07:47:17 +0000
@@ -60,7 +60,6 @@
 # record content length ?
                   
 
-from copy import copy
 from cStringIO import StringIO
 from itertools import izip, chain
 import operator
@@ -281,7 +280,12 @@
 
 
 class KnitContent(object):
-    """Content of a knit version to which deltas can be applied."""
+    """Content of a knit version to which deltas can be applied.
+    
+    This is always stored in memory as a list of lines with \n at the end,
+    plus a flag saying if the final ending is really there or not, because that 
+    corresponds to the on-disk knit representation.
+    """
 
     def __init__(self):
         self._should_strip_eol = False
@@ -290,12 +294,6 @@
         """Apply delta to this object to become new_version_id."""
         raise NotImplementedError(self.apply_delta)
 
-    def cleanup_eol(self, copy_on_mutate=True):
-        if self._should_strip_eol:
-            if copy_on_mutate:
-                self._lines = self._lines[:]
-            self.strip_last_line_newline()
-
     def line_delta_iter(self, new_lines):
         """Generate line-based delta from this content to new_lines."""
         new_texts = new_lines.text()
@@ -346,7 +344,11 @@
 
     def annotate(self):
         """Return a list of (origin, text) for each content line."""
-        return list(self._lines)
+        lines = self._lines[:]
+        if self._should_strip_eol:
+            origin, last_line = lines[-1]
+            lines[-1] = (origin, last_line.rstrip('\n'))
+        return lines
 
     def apply_delta(self, delta, new_version_id):
         """Apply delta to this object to become new_version_id."""
@@ -356,11 +358,6 @@
             lines[offset+start:offset+end] = delta_lines
             offset = offset + (start - end) + count
 
-    def strip_last_line_newline(self):
-        line = self._lines[-1][1].rstrip('\n')
-        self._lines[-1] = (self._lines[-1][0], line)
-        self._should_strip_eol = False
-
     def text(self):
         try:
             lines = [text for origin, text in self._lines]
@@ -371,7 +368,6 @@
             raise KnitCorrupt(self,
                 "line in annotated knit missing annotation information: %s"
                 % (e,))
-
         if self._should_strip_eol:
             lines[-1] = lines[-1].rstrip('\n')
         return lines
@@ -409,10 +405,6 @@
     def copy(self):
         return PlainKnitContent(self._lines[:], self._version_id)
 
-    def strip_last_line_newline(self):
-        self._lines[-1] = self._lines[-1].rstrip('\n')
-        self._should_strip_eol = False
-
     def text(self):
         lines = self._lines
         if self._should_strip_eol:
@@ -1193,8 +1185,8 @@
                 for i, j, n in seq.get_matching_blocks():
                     if n == 0:
                         continue
-                    # this appears to copy (origin, text) pairs across to the
-                    # new content for any line that matches the last-checked
+                    # this copies (origin, text) pairs across to the new
+                    # content for any line that matches the last-checked
                     # parent.
                     content._lines[j:j+n] = merge_content._lines[i:i+n]
         if delta:
@@ -1339,6 +1331,10 @@
             delta = self._check_should_delta(present_parents)
 
         content = self.factory.make(lines, version_id)
+        if 'no-eol' in options:
+            # Hint to the content object that its text() call should strip the
+            # EOL.
+            content._should_strip_eol = True
         if delta or (self.factory.annotated and len(present_parents) > 0):
             # Merge annotations from parent texts if needed.
             delta_hunks = self._merge_annotations(content, present_parents,
@@ -1473,7 +1469,6 @@
                     if multiple_versions:
                         content_map[component_id] = content
 
-            content.cleanup_eol(copy_on_mutate=multiple_versions)
             final_content[version_id] = content
 
             # digest here is the digest from the last applied component.
@@ -2278,6 +2273,10 @@
         self._need_to_create = _need_to_create
         self._create_parent_dir = _create_parent_dir
 
+    def __repr__(self):
+        return "%s(%r)" % (self.__class__.__name__,
+            self._transport.abspath(self._filename))
+
     def add_raw_records(self, sizes, raw_data):
         """Add raw knit bytes to a storage area.
 
@@ -2697,6 +2696,8 @@
                                      digest)],
             dense_lines or lines,
             ["end %s\n" % version_id]))
+        if lines and lines[-1][-1] != '\n':
+            raise ValueError('corrupt lines value %r' % lines)
         compressed_bytes = bytes_to_gzip(bytes)
         return len(compressed_bytes), compressed_bytes
 

=== modified file 'bzrlib/tests/__init__.py'
--- a/bzrlib/tests/__init__.py	2008-05-12 10:16:36 +0000
+++ b/bzrlib/tests/__init__.py	2008-06-04 05:59:16 +0000
@@ -861,6 +861,10 @@
             return
         if message is None:
             message = "texts not equal:\n"
+        if a == b + '\n':
+            message = 'first string is missing a final newline.\n'
+        if a + '\n' == b:
+            message = 'second string is missing a final newline.\n'
         raise AssertionError(message +
                              self._ndiff_strings(a, b))
         

=== modified file 'bzrlib/tests/test_versionedfile.py'
--- a/bzrlib/tests/test_versionedfile.py	2008-05-12 02:40:40 +0000
+++ b/bzrlib/tests/test_versionedfile.py	2008-06-04 07:47:17 +0000
@@ -530,6 +530,82 @@
         self.assertRaises(errors.ReservedId, vf.get_lines, 'b:')
         self.assertRaises(errors.ReservedId, vf.get_text, 'b:')
 
+    def test_add_unchanged_last_line_noeol_snapshot(self):
+        """Add a text with an unchanged last line with no eol should work."""
+        # Test adding this in a number of chain lengths; because the interface
+        # for VersionedFile does not allow forcing a specific chain length, we
+        # just use a small base to get the first snapshot, then a much longer
+        # first line for the next add (which will make the third add snapshot)
+        # and so on. 20 has been chosen as an aribtrary figure - knits use 200
+        # as a capped delta length, but ideally we would have some way of
+        # tuning the test to the store (e.g. keep going until a snapshot
+        # happens).
+        for length in range(20):
+            version_lines = {}
+            vf = self.get_file('case-%d' % length)
+            prefix = 'step-%d'
+            parents = []
+            for step in range(length):
+                version = prefix % step
+                lines = (['prelude \n'] * step) + ['line']
+                vf.add_lines(version, parents, lines)
+                version_lines[version] = lines
+                parents = [version]
+            vf.add_lines('no-eol', parents, ['line'])
+            vf.get_texts(version_lines.keys())
+            self.assertEqualDiff('line', vf.get_text('no-eol'))
+
+    def test_get_texts_eol_variation(self):
+        # similar to the failure in <http://bugs.launchpad.net/234748>
+        vf = self.get_file()
+        sample_text_nl = ["line\n"]
+        sample_text_no_nl = ["line"]
+        versions = []
+        version_lines = {}
+        parents = []
+        for i in range(4):
+            version = 'v%d' % i
+            if i % 2:
+                lines = sample_text_nl
+            else:
+                lines = sample_text_no_nl
+            # left_matching blocks is an internal api; it operates on the
+            # *internal* representation for a knit, which is with *all* lines
+            # being normalised to end with \n - even the final line in a no_nl
+            # file. Using it here ensures that a broken internal implementation
+            # (which is what this test tests) will generate a correct line
+            # delta (which is to say, an empty delta).
+            vf.add_lines(version, parents, lines,
+                left_matching_blocks=[(0, 0, 1)])
+            parents = [version]
+            versions.append(version)
+            version_lines[version] = lines
+        vf.check()
+        vf.get_texts(versions)
+        vf.get_texts(reversed(versions))
+
+    def test_add_lines_with_matching_blocks_noeol_last_line(self):
+        """Add a text with an unchanged last line with no eol should work."""
+        from bzrlib import multiparent
+        # Hand verified sha1 of the text we're adding.
+        sha1 = '6a1d115ec7b60afb664dc14890b5af5ce3c827a4'
+        # Create a mpdiff which adds a new line before the trailing line, and
+        # reuse the last line unaltered (which can cause annotation reuse).
+        # Test adding this in two situations:
+        # On top of a new insertion
+        vf = self.get_file('fulltext')
+        vf.add_lines('noeol', [], ['line'])
+        vf.add_lines('noeol2', ['noeol'], ['newline\n', 'line'],
+            left_matching_blocks=[(0, 1, 1)])
+        self.assertEqualDiff('newline\nline', vf.get_text('noeol2'))
+        # On top of a delta
+        vf = self.get_file('delta')
+        vf.add_lines('base', [], ['line'])
+        vf.add_lines('noeol', ['base'], ['prelude\n', 'line'])
+        vf.add_lines('noeol2', ['noeol'], ['newline\n', 'line'],
+            left_matching_blocks=[(1, 1, 1)])
+        self.assertEqualDiff('newline\nline', vf.get_text('noeol2'))
+
     def test_make_mpdiffs(self):
         from bzrlib import multiparent
         vf = self.get_file('foo')




More information about the bazaar-commits mailing list