Rev 2758: Merge knits improvements. in http://people.ubuntu.com/~robertc/baz2.0/repository

Robert Collins robertc at robertcollins.net
Mon Sep 10 08:11:17 BST 2007


At http://people.ubuntu.com/~robertc/baz2.0/repository

------------------------------------------------------------
revno: 2758
revision-id: robertc at robertcollins.net-20070910071105-2v9edutj05b4ww20
parent: robertc at robertcollins.net-20070910020404-k9kv52jz4rs58lpo
parent: robertc at robertcollins.net-20070910035304-iwf9nmqoxeshjaki
committer: Robert Collins <robertc at robertcollins.net>
branch nick: repository
timestamp: Mon 2007-09-10 17:11:05 +1000
message:
  Merge knits improvements.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
  bzrlib/tests/repository_implementations/test_commit_builder.py test_commit_builder.py-20060606110838-76e3ra5slucqus81-1
  bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
  bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
  bzrlib/weave.py                knit.py-20050627021749-759c29984154256b
    ------------------------------------------------------------
    revno: 2592.1.25.2.7.1.28.1.6.1.3.1.9.2.1.3.74.1.31.3.18.1.9.1.6
    revision-id: robertc at robertcollins.net-20070910035304-iwf9nmqoxeshjaki
    parent: robertc at robertcollins.net-20070910033719-nqsmg72617f0tvyo
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: knits
    timestamp: Mon 2007-09-10 13:53:04 +1000
    message:
      Don't check for existing versions when adding texts with random revision ids.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
      bzrlib/weave.py                knit.py-20050627021749-759c29984154256b
    ------------------------------------------------------------
    revno: 2592.1.25.2.7.1.28.1.6.1.3.1.9.2.1.3.74.1.31.3.18.1.9.1.5
    revision-id: robertc at robertcollins.net-20070910033719-nqsmg72617f0tvyo
    parent: robertc at robertcollins.net-20070910025423-qeqnv361y8yukjzc
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: knits
    timestamp: Mon 2007-09-10 13:37:19 +1000
    message:
      * The ``VersionedFile`` interface no longer protects against misuse when
        lines that are not lines, or are not strings are supplied. This saves
        nearly 30% of the minimum cost to store a version of a file.
        (Robert Collins)
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
    ------------------------------------------------------------
    revno: 2592.1.25.2.7.1.28.1.6.1.3.1.9.2.1.3.74.1.31.3.18.1.9.1.4
    revision-id: robertc at robertcollins.net-20070910025423-qeqnv361y8yukjzc
    parent: robertc at robertcollins.net-20070910012710-bubnul8jp0mr5lhk
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: knits
    timestamp: Mon 2007-09-10 12:54:23 +1000
    message:
      General cleanup of KnitVersionedFile._add.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
    ------------------------------------------------------------
    revno: 2592.1.25.2.7.1.28.1.6.1.3.1.9.2.1.3.74.1.31.3.18.1.9.1.3
    revision-id: robertc at robertcollins.net-20070910012710-bubnul8jp0mr5lhk
    parent: pqm at pqm.ubuntu.com-20070907145828-hjh5941jv7y8d9z8
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: knits
    timestamp: Mon 2007-09-10 11:27:10 +1000
    message:
      Set random_revid on CommitBuilder when a commit generated its own revision id.
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/repository_implementations/test_commit_builder.py test_commit_builder.py-20060606110838-76e3ra5slucqus81-1
=== modified file 'NEWS'
--- a/NEWS	2007-09-09 23:01:47 +0000
+++ b/NEWS	2007-09-10 07:11:05 +0000
@@ -192,6 +192,11 @@
      allows the avoidance of double-sha1 calculations during commit.
      (Robert Collins)
 
+   * The ``VersionedFile`` interface no longer protects against misuse when
+     lines that are not lines, or are not strings are supplied. This saves
+     nearly 30% of the minimum cost to store a version of a file.
+     (Robert Collins)
+
    * ``Transport.should_cache`` has been removed.  It was not called in the
      previous release.  (Martin Pool)
 

=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2007-09-09 23:01:47 +0000
+++ b/bzrlib/knit.py	2007-09-10 07:11:05 +0000
@@ -831,37 +831,31 @@
         self._index.check_versions_present(version_ids)
 
     def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts,
-        nostore_sha):
+        nostore_sha, random_id):
         """See VersionedFile.add_lines_with_ghosts()."""
-        self._check_add(version_id, lines)
-        return self._add(version_id, lines[:], parents, self.delta,
+        self._check_add(version_id, lines, random_id)
+        return self._add(version_id, lines, parents, self.delta,
             parent_texts, None, nostore_sha)
 
     def _add_lines(self, version_id, parents, lines, parent_texts,
-                   left_matching_blocks, nostore_sha):
+                   left_matching_blocks, nostore_sha, random_id):
         """See VersionedFile.add_lines."""
-        self._check_add(version_id, lines)
+        self._check_add(version_id, lines, random_id)
         self._check_versions_present(parents)
         return self._add(version_id, lines[:], parents, self.delta,
             parent_texts, left_matching_blocks, nostore_sha)
 
-    def _check_add(self, version_id, lines):
+    def _check_add(self, version_id, lines, random_id):
         """check that version_id and lines are safe to add."""
-        assert self.writable, "knit is not opened for write"
-        ### FIXME escape. RBC 20060228
         if contains_whitespace(version_id):
             raise InvalidRevisionId(version_id, self.filename)
         self.check_not_reserved_id(version_id)
-        # Technically this is a case of Look Before You Leap, but:
-        # - for knits this saves wasted space in the error case
-        # - for packs this avoids dead space in the pack
-        # - it also avoids undetected poisoning attacks.
-        # - its 1.5% of total commit time, so ignore it unless it becomes a
-        #   larger percentage.
-        if self.has_version(version_id):
+        # Technically this could be avoided if we are happy to allow duplicate
+        # id insertion when other things than bzr core insert texts, but it
+        # seems useful for folk using the knit api directly to have some safety
+        # blanket that we can disable.
+        if not random_id and self.has_version(version_id):
             raise RevisionAlreadyPresent(version_id, self.filename)
-        self._check_lines_not_unicode(lines)
-        self._check_lines_are_lines(lines)
 
     def _add(self, version_id, lines, parents, delta, parent_texts,
              left_matching_blocks, nostore_sha):
@@ -885,16 +879,16 @@
         # +61     0   1918.1800      5.2640   +bzrlib.knit:359(_merge_annotations)
 
         present_parents = []
-        ghosts = []
         if parent_texts is None:
             parent_texts = {}
         for parent in parents:
-            if not self.has_version(parent):
-                ghosts.append(parent)
-            else:
+            if self.has_version(parent):
                 present_parents.append(parent)
 
-        if delta and not len(present_parents):
+        # can only compress against the left most present parent.
+        if (delta and
+            (len(present_parents) == 0 or
+             present_parents[0] != parents[0])):
             delta = False
 
         digest = sha_strings(lines)
@@ -904,10 +898,12 @@
         options = []
         if lines:
             if lines[-1][-1] != '\n':
+                # copy the contents of lines.
+                lines = lines[:]
                 options.append('no-eol')
                 lines[-1] = lines[-1] + '\n'
 
-        if len(present_parents) and delta:
+        if delta:
             # To speed the extract of texts the delta chain is limited
             # to a fixed number of deltas.  This should minimize both
             # I/O and the time spend applying deltas.
@@ -916,7 +912,7 @@
         assert isinstance(version_id, str)
         content = self.factory.make(lines, version_id)
         if delta or (self.factory.annotated and len(present_parents) > 0):
-            # Merge annotations from parent texts if so is needed.
+            # Merge annotations from parent texts if needed.
             delta_hunks = self._merge_annotations(content, present_parents,
                 parent_texts, delta, self.factory.annotated,
                 left_matching_blocks)

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2007-09-09 23:01:47 +0000
+++ b/bzrlib/repository.py	2007-09-10 07:11:05 +0000
@@ -2373,6 +2373,9 @@
         """
         if self._new_revision_id is None:
             self._new_revision_id = self._gen_revision_id()
+            self.random_revid = True
+        else:
+            self.random_revid = False
 
     def _check_root(self, ie, parent_invs, tree):
         """Helper for record_entry_contents.
@@ -2545,7 +2548,7 @@
         try:
             return versionedfile.add_lines_with_ghosts(
                 self._new_revision_id, parents, new_lines,
-                nostore_sha=nostore_sha)[0:2]
+                nostore_sha=nostore_sha, random_id=self.random_id)[0:2]
         finally:
             versionedfile.clear_cache()
 

=== modified file 'bzrlib/tests/repository_implementations/test_commit_builder.py'
--- a/bzrlib/tests/repository_implementations/test_commit_builder.py	2007-09-06 01:09:06 +0000
+++ b/bzrlib/tests/repository_implementations/test_commit_builder.py	2007-09-10 07:11:05 +0000
@@ -34,6 +34,7 @@
         builder = branch.repository.get_commit_builder(
             branch, [], branch.get_config())
         self.assertIsInstance(builder, CommitBuilder)
+        self.assertTrue(builder.random_revid)
         branch.repository.commit_write_group()
         branch.repository.unlock()
 
@@ -102,6 +103,7 @@
             except CannotSetRevisionId:
                 # This format doesn't support supplied revision ids
                 return
+            self.assertFalse(builder.random_revid)
             self.record_root(builder, tree)
             builder.finish_inventory()
             self.assertEqual(revision_id, builder.commit('foo bar'))

=== modified file 'bzrlib/tests/test_versionedfile.py'
--- a/bzrlib/tests/test_versionedfile.py	2007-09-09 23:01:47 +0000
+++ b/bzrlib/tests/test_versionedfile.py	2007-09-10 07:11:05 +0000
@@ -115,16 +115,6 @@
         f = self.reopen_file()
         verify_file(f)
 
-    def test_add_unicode_content(self):
-        # unicode content is not permitted in versioned files. 
-        # versioned files version sequences of bytes only.
-        vf = self.get_file()
-        self.assertRaises(errors.BzrBadParameterUnicode,
-            vf.add_lines, 'a', [], ['a\n', u'b\n', 'c\n'])
-        self.assertRaises(
-            (errors.BzrBadParameterUnicode, NotImplementedError),
-            vf.add_lines_with_ghosts, 'a', [], ['a\n', u'b\n', 'c\n'])
-
     def test_add_follows_left_matching_blocks(self):
         """If we change left_matching_blocks, delta changes
 
@@ -142,21 +132,6 @@
                      left_matching_blocks=[(0, 2, 1), (1, 3, 0)])
         self.assertEqual(['a\n', 'a\n', 'a\n'], vf.get_lines('3'))
 
-    def test_inline_newline_throws(self):
-        # \r characters are not permitted in lines being added
-        vf = self.get_file()
-        self.assertRaises(errors.BzrBadParameterContainsNewline, 
-            vf.add_lines, 'a', [], ['a\n\n'])
-        self.assertRaises(
-            (errors.BzrBadParameterContainsNewline, NotImplementedError),
-            vf.add_lines_with_ghosts, 'a', [], ['a\n\n'])
-        # but inline CR's are allowed
-        vf.add_lines('a', [], ['a\r\n'])
-        try:
-            vf.add_lines_with_ghosts('b', [], ['a\r\n'])
-        except NotImplementedError:
-            pass
-
     def test_add_reserved(self):
         vf = self.get_file()
         self.assertRaises(errors.ReservedId,

=== modified file 'bzrlib/versionedfile.py'
--- a/bzrlib/versionedfile.py	2007-09-09 22:28:46 +0000
+++ b/bzrlib/versionedfile.py	2007-09-10 07:11:05 +0000
@@ -78,7 +78,7 @@
         raise NotImplementedError(self.has_version)
 
     def add_lines(self, version_id, parents, lines, parent_texts=None,
-                  left_matching_blocks=None, nostore_sha=None):
+        left_matching_blocks=None, nostore_sha=None, random_id=False):
         """Add a single text on top of the versioned file.
 
         Must raise RevisionAlreadyPresent if the new version is
@@ -86,16 +86,30 @@
 
         Must raise RevisionNotPresent if any of the given parents are
         not present in file history.
+
+        :param lines: A list of lines. Each line must be a bytestring. And all
+            of them except the last must be terminated with \n and contain no
+            other \n's. The last line may either contain no \n's or a single
+            terminated \n. If the lines list does meet this constraint the add
+            routine may error or may succeed - but you will be unable to read
+            the data back accurately. (Checking the lines have been split
+            correctly is expensive and extermely unlikely to catch bugs so it
+            is not done at runtime.)
         :param parent_texts: An optional dictionary containing the opaque 
-             representations of some or all of the parents of 
-             version_id to allow delta optimisations. 
-             VERY IMPORTANT: the texts must be those returned
-             by add_lines or data corruption can be caused.
+            representations of some or all of the parents of version_id to
+            allow delta optimisations.  VERY IMPORTANT: the texts must be those
+            returned by add_lines or data corruption can be caused.
         :param left_matching_blocks: a hint about which areas are common
             between the text and its left-hand-parent.  The format is
             the SequenceMatcher.get_matching_blocks format.
         :param nostore_sha: Raise ExistingContent and do not add the lines to
             the versioned file if the digest of the lines matches this.
+        :param random_id: If True a random id has been selected rather than
+            an id determined by some deterministic process such as a converter
+            from a foreign VCS. When True the backend may choose not to check
+            for uniqueness of the resulting key within the versioned file, so
+            this should only be done when the result is expected to be unique
+            anyway.
         :return: The text sha1, the number of bytes in the text, and an opaque
                  representation of the inserted version which can be provided
                  back to future add_lines calls in the parent_texts dictionary.
@@ -104,15 +118,15 @@
         parents = [osutils.safe_revision_id(v) for v in parents]
         self._check_write_ok()
         return self._add_lines(version_id, parents, lines, parent_texts,
-            left_matching_blocks, nostore_sha)
+            left_matching_blocks, nostore_sha, random_id)
 
     def _add_lines(self, version_id, parents, lines, parent_texts,
-        left_matching_blocks, nostore_sha):
+        left_matching_blocks, nostore_sha, random_id):
         """Helper to do the class specific add_lines."""
         raise NotImplementedError(self.add_lines)
 
     def add_lines_with_ghosts(self, version_id, parents, lines,
-                              parent_texts=None, nostore_sha=None):
+        parent_texts=None, nostore_sha=None, random_id=False):
         """Add lines to the versioned file, allowing ghosts to be present.
         
         This takes the same parameters as add_lines and returns the same.
@@ -121,10 +135,10 @@
         parents = [osutils.safe_revision_id(v) for v in parents]
         self._check_write_ok()
         return self._add_lines_with_ghosts(version_id, parents, lines,
-            parent_texts, nostore_sha)
+            parent_texts, nostore_sha, random_id)
 
     def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts,
-        nostore_sha):
+        nostore_sha, random_id):
         """Helper to do class specific add_lines_with_ghosts."""
         raise NotImplementedError(self.add_lines_with_ghosts)
 

=== modified file 'bzrlib/weave.py'
--- a/bzrlib/weave.py	2007-09-05 22:25:01 +0000
+++ b/bzrlib/weave.py	2007-09-10 03:53:04 +0000
@@ -870,7 +870,7 @@
             self._save()
 
     def _add_lines(self, version_id, parents, lines, parent_texts,
-        left_matching_blocks, nostore_sha):
+        left_matching_blocks, nostore_sha, random_id):
         """Add a version and save the weave."""
         self.check_not_reserved_id(version_id)
         result = super(WeaveFile, self)._add_lines(version_id, parents, lines,



More information about the bazaar-commits mailing list