[MERGE] knit text adding reductions

Ian Clatworthy ian.clatworthy at internode.on.net
Wed Sep 12 01:13:33 BST 2007


Robert Collins wrote:

> Knits no longer assert that content being inserted is correctly split
> and bytestrings. The overhead of doing that was nearly 1/3 the time it
> takes to gzip the content - a very high overhead, and one that is
> unlikely to find bugs in bzr's core code. I've added docstring changes
> to make it clear that this is not enforced by the API.

bb:tweak

As Aaron has expressed as well, safety vs performance in API design
leaves my torn. I really want both.

I think what you've done here with random_id is excellent: made the
default safe but provided a way for the client code to say "skip the
check - I know what I'm doing".

I'd like to suggest you do the same for the content checks. Add a
parameter called (say) be_paranoid, default it to True and use that
parameter to run the checks if set. In the call to add_lines_with_ghosts
in repository.py, we can set be_paranoid=False and get 99% of the speed
gain.

Some other questions/comments below.

> ------------------------------------------------------------------------
> 
> # Bazaar merge directive format 2 (Bazaar 0.90)
> # revision_id: robertc at robertcollins.net-20070910035304-\
> #   iwf9nmqoxeshjaki
> # target_branch: http://bazaar-vcs.org/bzr/bzr.dev
> # testament_sha1: 4830a4736b68330b212ffba5511d2f144f284ea9
> # timestamp: 2007-09-10 13:53:25 +1000
> # source_branch: http://people.ubuntu.com/~robertc/baz2.0/knits
> # base_revision_id: pqm at pqm.ubuntu.com-20070907145828-hjh5941jv7y8d9z8
> # 
> # Begin patch
> === modified file 'NEWS'
> --- NEWS	2007-09-07 13:33:03 +0000
> +++ NEWS	2007-09-10 03:37:19 +0000
> @@ -186,6 +186,11 @@
>       allows the avoidance of double-sha1 calculations during commit.
>       (Robert Collins)
>  
> +   * The ``VersionedFile`` interface no longer protects against misuse when
> +     lines that are not lines, or are not strings are supplied. This saves
> +     nearly 30% of the minimum cost to store a version of a file.
> +     (Robert Collins)
> +
>     * ``Transport.should_cache`` has been removed.  It was not called in the
>       previous release.  (Martin Pool)

Rather than 'no longer protects', reword as 'checks can be bypassed'
(assuming you agree with the be_paranoid approach).


> @@ -904,10 +898,12 @@
>          options = []
>          if lines:
>              if lines[-1][-1] != '\n':
> +                # copy the contents of lines.
> +                lines = lines[:]
>                  options.append('no-eol')
>                  lines[-1] = lines[-1] + '\n'

Is this fixing a bug? It wasn't obvious to me why you needed to make a
copy of the lines (given the other changes I could see in this patch).

> === modified file 'bzrlib/tests/repository_implementations/test_commit_builder.py'
> --- bzrlib/tests/repository_implementations/test_commit_builder.py	2007-09-04 03:53:07 +0000
> +++ bzrlib/tests/repository_implementations/test_commit_builder.py	2007-09-10 01:27:10 +0000
> @@ -34,6 +34,7 @@
>          builder = branch.repository.get_commit_builder(
>              branch, [], branch.get_config())
>          self.assertIsInstance(builder, CommitBuilder)
> +        self.assertTrue(builder.random_revid)
>          branch.repository.commit_write_group()
>          branch.repository.unlock()
>  
> @@ -101,6 +102,7 @@
>              except CannotSetRevisionId:
>                  # This format doesn't support supplied revision ids
>                  return
> +            self.assertFalse(builder.random_revid)
>              self.record_root(builder, tree)
>              builder.finish_inventory()
>              self.assertEqual(revision_id, builder.commit('foo bar'))

I'm fine with this. I'm curious though why you felt it necessary to make
random_revid a "public" attribute of a CommitBuilder. Being internal and
 not testing its value in unit tests would have been ok IMO. By making
it public, I gather you want each and every CommitBuilder from now on to
expose this attribute?

> === modified file 'bzrlib/tests/test_versionedfile.py'
> --- bzrlib/tests/test_versionedfile.py	2007-09-06 05:58:13 +0000
> +++ bzrlib/tests/test_versionedfile.py	2007-09-10 03:37:19 +0000
> @@ -115,16 +115,6 @@
>          f = self.reopen_file()
>          verify_file(f)
>  
> -    def test_add_unicode_content(self):
> -        # unicode content is not permitted in versioned files. 
> -        # versioned files version sequences of bytes only.
> -        vf = self.get_file()
> -        self.assertRaises(errors.BzrBadParameterUnicode,
> -            vf.add_lines, 'a', [], ['a\n', u'b\n', 'c\n'])
> -        self.assertRaises(
> -            (errors.BzrBadParameterUnicode, NotImplementedError),
> -            vf.add_lines_with_ghosts, 'a', [], ['a\n', u'b\n', 'c\n'])
> -
>      def test_add_follows_left_matching_blocks(self):
>          """If we change left_matching_blocks, delta changes
>  
> @@ -142,21 +132,6 @@
>                       left_matching_blocks=[(0, 2, 1), (1, 3, 0)])
>          self.assertEqual(['a\n', 'a\n', 'a\n'], vf.get_lines('3'))
>  
> -    def test_inline_newline_throws(self):
> -        # \r characters are not permitted in lines being added
> -        vf = self.get_file()
> -        self.assertRaises(errors.BzrBadParameterContainsNewline, 
> -            vf.add_lines, 'a', [], ['a\n\n'])
> -        self.assertRaises(
> -            (errors.BzrBadParameterContainsNewline, NotImplementedError),
> -            vf.add_lines_with_ghosts, 'a', [], ['a\n\n'])
> -        # but inline CR's are allowed
> -        vf.add_lines('a', [], ['a\r\n'])
> -        try:
> -            vf.add_lines_with_ghosts('b', [], ['a\r\n'])
> -        except NotImplementedError:
> -            pass
> -
>      def test_add_reserved(self):
>          vf = self.get_file()
>          self.assertRaises(errors.ReservedId,

I'd like to leave these tests in given they are still meaningful in the
be_paranoid=True case.

Ian C.



More information about the bazaar mailing list