[MERGE] knit text adding reductions

Wed Sep 12 01:28:02 BST 2007

On Wed, 2007-09-12 at 10:13 +1000, Ian Clatworthy wrote:
> Robert Collins wrote:
> 
> > Knits no longer assert that content being inserted is correctly split
> > and bytestrings. The overhead of doing that was nearly 1/3 the time it
> > takes to gzip the content - a very high overhead, and one that is
> > unlikely to find bugs in bzr's core code. I've added docstring changes
> > to make it clear that this is not enforced by the API.
> 
> bb:tweak
> 
> As Aaron has expressed as well, safety vs performance in API design
> leaves my torn. I really want both.
> 
> I think what you've done here with random_id is excellent: made the
> default safe but provided a way for the client code to say "skip the
> check - I know what I'm doing".
> 
> I'd like to suggest you do the same for the content checks. Add a
> parameter called (say) be_paranoid, default it to True and use that
> parameter to run the checks if set. In the call to add_lines_with_ghosts
> in repository.py, we can set be_paranoid=False and get 99% of the speed
> gain.

ok, I'll do that.

> Some other questions/comments below.

You might trim more of the context below... 

> > +   * The ``VersionedFile`` interface no longer protects against misuse when
> > +     lines that are not lines, or are not strings are supplied. This saves
> > +     nearly 30% of the minimum cost to store a version of a file.
> > +     (Robert Collins)
> > +
> >     * ``Transport.should_cache`` has been removed.  It was not called in the
> >       previous release.  (Martin Pool)
> 
> Rather than 'no longer protects', reword as 'checks can be bypassed'
> (assuming you agree with the be_paranoid approach).

Sure.

> 
> > @@ -904,10 +898,12 @@
> >          options = []
> >          if lines:
> >              if lines[-1][-1] != '\n':
> > +                # copy the contents of lines.
> > +                lines = lines[:]
> >                  options.append('no-eol')
> >                  lines[-1] = lines[-1] + '\n'
> 
> Is this fixing a bug? It wasn't obvious to me why you needed to make a
> copy of the lines (given the other changes I could see in this patch).

No, its a matching change for the change from
self._add(..lines[:],...) to
self._add(..lines,...) elsewhere in the diff.

> > @@ -101,6 +102,7 @@
> >              except CannotSetRevisionId:
> >                  # This format doesn't support supplied revision ids
> >                  return
> > +            self.assertFalse(builder.random_revid)
> >              self.record_root(builder, tree)
> >              builder.finish_inventory()
> >              self.assertEqual(revision_id, builder.commit('foo bar'))
> 
> I'm fine with this. I'm curious though why you felt it necessary to make
> random_revid a "public" attribute of a CommitBuilder. Being internal and
>  not testing its value in unit tests would have been ok IMO. By making
> it public, I gather you want each and every CommitBuilder from now on to
> expose this attribute?

It was the most convenient place to assert that the parameter is being
generated correctly, and that does need to be tested. Until we have an
implementation that does not generate this having it as an
implementation test seems fine to me.

> >      def test_add_reserved(self):
> >          vf = self.get_file()
> >          self.assertRaises(errors.ReservedId,
> 
> I'd like to leave these tests in given they are still meaningful in the
> be_paranoid=True case.

Sure, I'll back that out.

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070912/5ef74aa0/attachment.pgp