[MERGE] Treat assert statements in our code as a hard error

Tue Apr 29 08:49:15 BST 2008

Martin Pool wrote:
> Per the previous rfc, here is a long patch that removes all assert statements
> from bzrlib.
> 
> It's not quite mechanical because I tried to make a reasonable judgement about
> what to do with each of them.  Some should always be done with an if/raise, some 
> should be done as unittest assertions, and some are look-before-you-leap or
> unlikely to ever fail.

bb:tweak

This patch looks good to me.  I have a couple of comments on specific parts, but
nothing serious.  I ran all the changed files through pyflakes and didn't find
any typos in the new code (I found a few old ones, though...).

[...]
> The test for not using assert uses the python parser to read all our files.  It
> is a bit slow, taking 12s on my machine to load and search them.  We could use
> a less precise but faster regexp, but I think the cost is reasonable to avoid
> false negatives or positives.

I think that 12s is acceptable for that.

> === modified file 'bzrlib/_dirstate_helpers_py.py'
> --- bzrlib/_dirstate_helpers_py.py	2007-10-09 03:35:33 +0000
> +++ bzrlib/_dirstate_helpers_py.py	2008-04-24 07:22:53 +0000
> @@ -200,7 +200,9 @@
>      fields = text.split('\0')
>      # Remove the last blank entry
>      trailing = fields.pop()
> -    assert trailing == ''
> +    if trailing != '':
> +        raise AssertionError("dirtstate line %r has trailing garbage: %r" 
> +            % (trailing,))

As Matt Nordhoff pointed out, you have a typo here: “dirtstate” :)

> === modified file 'bzrlib/_patiencediff_py.py'
> --- bzrlib/_patiencediff_py.py	2007-09-04 09:10:35 +0000
> +++ bzrlib/_patiencediff_py.py	2008-04-24 07:22:53 +0000
> @@ -197,8 +197,10 @@
>      next_a = -1
>      next_b = -1
>      for a,b,match_len in answer:
> -        assert a >= next_a, 'Non increasing matches for a'
> -        assert b >= next_b, 'Not increasing matches for b'
> +        if a < next_a:
> +            raise ValueError('Non increasing matches for a')
> +        if b < next_b:
> +            raise ValueError('Not increasing matches for b')

Whereas here you kept an existing typo: “Non” vs. “Not”.

If you feel like doing an extra drive-by cleanup, PEP 8 says there should be
more whitespace in “for a,b,match_len”.

> === modified file 'bzrlib/dirstate.py'

This wasn't changed by your patch, but pyflakes shows that some other
AssertionErrors in this file have a typo in them:

bzrlib/dirstate.py:2583: undefined name 'dirblocks'
bzrlib/dirstate.py:2588: undefined name 'dirblocks'

The relevant code is:

        if len(self._dirblocks) > 0:
            if not self._dirblocks[0][0] == '':
                raise AssertionError(
                    "dirblocks don't start with root block:\n" + \
                    pformat(dirblocks))
        if len(self._dirblocks) > 1:
            if not self._dirblocks[1][0] == '':
                raise AssertionError(
                    "dirblocks missing root directory:\n" + \
                    pformat(dirblocks))

It'd be good to do the trivial fix (s/dirblocks/self._dirblocks/) on the
offending lines as part of this patch.

> --- bzrlib/dirstate.py	2008-03-06 18:21:43 +0000
> +++ bzrlib/dirstate.py	2008-04-24 07:22:53 +0000
> @@ -393,8 +393,9 @@
>          # faster than three separate encodes.
>          utf8path = (dirname + '/' + basename).strip('/').encode('utf8')
>          dirname, basename = osutils.split(utf8path)
> -        assert file_id.__class__ == str, \
> -            "must be a utf8 file_id not %s" % (type(file_id))
> +        if file_id.__class__ != str:

“if type(file_id) is not str:” is the usual idiom, and avoids the nasty
double-underscores.  “is not” is faster than “!=” too.

Hmm, maybe this code is using __class__ for speed?  Robert might remember.  If
so, a comment to say so would be good.

[...]
> -            assert path_utf8.__class__ == str, ('path_utf8 is not a str: %s %s'
> -                % (type(path_utf8), path_utf8))
> +            if not isinstance(path_utf8, str):

Again, “type(path_utf8) is not str” is preferable, I think.

> === modified file 'bzrlib/inventory.py'
> --- bzrlib/inventory.py	2008-04-08 03:39:43 +0000
> +++ bzrlib/inventory.py	2008-04-24 07:22:53 +0000
> @@ -112,7 +112,6 @@
>      InventoryFile('2326', 'wibble.c', parent_id='2325', sha1=None, len=None)
>      >>> for path, entry in i.iter_entries():
>      ...     print path
> -    ...     assert i.path2id(path)

This is effectively changing a test... I guess you've already carefully
considered the impact of this?  The obvious alternative would be something like
“if not i.path2id(path): raise AssertionError()”.

> -    @deprecated_method(symbol_versioning.one_zero)
> -    def diff(self, text_diff, from_label, tree, to_label, to_entry, to_tree,
> -             output_to, reverse=False):

Hooray for removing deprecated code! :)

> === modified file 'bzrlib/lazy_import.py'
[...]
> -        assert import_str.startswith('import ')
> +        if not import_str.startswith('import '):
> +            raise ValueError('bad import string %r'
> +                % (import_str,))

It's weird to wrap such a short line.

> === modified file 'bzrlib/merge.py'
[...]
> @@ -1236,7 +1236,6 @@
>          a_cur = ai + l
>          b_cur = bi + l
>          for text_a, text_b in zip(plain_a[ai:a_cur], plain_b[bi:b_cur]):
> -            assert text_a == text_b
>              yield "unchanged", text_a

I feel slightly sad to lose some of these assertions, like this one.  Even
though I don't expect it to ever fail, it helps the readability of the function
a little bit by reminding the reader of an invariant or precondition in the
code, especially algorithmic code.  Not so much that it out-weighs the
drawbacks...

> === modified file 'bzrlib/merge3.py'
[...]
>          for zmatch, zend, amatch, aend, bmatch, bend in self.find_sync_regions():
> -            #print 'match base [%d:%d]' % (zmatch, zend)
> -            
>              matchlen = zend - zmatch
> -            assert matchlen >= 0
> -            assert matchlen == (aend - amatch)
> -            assert matchlen == (bend - bmatch)

Ditto here...

Perhaps for very algorithmic code like these we should just have a comment like:

            # Loop invariants:
            #   - matchlen >= 0
            #   - matchlen == (aend - amatch)
            #   - matchlen == (bend - amatch)

I haven't looked at this particular method outside of this diff, so I'm not sure
how helpful this would actually be.  Just a thought.

> === modified file 'bzrlib/osutils.py'
[...]
>  def sha_file(f):
> -    if getattr(f, 'tell', None) is not None:
> -        assert f.tell() == 0
> +    """Calculate the hexdigest of an open file.
> +
> +    The file cursor should be already at the start.
> +    """

Thanks for adding a docstring!

> === modified file 'bzrlib/remote.py'
[...]
> -            assert type(key) is str
> +            if not isinstance(key, str):
> +                raise ValueError(key)

Hmm, why change to isinstance here?  Having a good reason to use a subclass of
str is a really unlikely...  The extra flexibility strikes me as a YAGNI (and
changing type to isinstance later is cheap if I turn out to be wrong).

> === modified file 'bzrlib/tests/test_patches.py'
[...]
> -import unittest
[...]
>  def test():
>      patchesTestSuite = unittest.makeSuite(PatchesTester,'test')

There's still a few references to the “unittest” global in this module.  Looks
like it's probably dead code, so just delete it I guess.

> === modified file 'bzrlib/status.py'
[...]
> -            assert inner_merges[0] is None
> -            inner_merges.pop(0)
> +            if inner_merges.pop(0) != None:
> +                raise AssertionError()

It's better style to compare None using “is”/“is not” than ==/!=.  The is
operator can't be fooled by overriding a magic method, and it's quicker.

> === modified file 'bzrlib/repofmt/pack_repo.py'

Again, not the fault of your patch, but there's some pyflakes warnings in this
file that are coincidentally also in rarely-hit code paths (like a raising an
exception):

bzrlib/repofmt/pack_repo.py:1099: undefined name 'keys'
bzrlib/repofmt/pack_repo.py:1936: undefined name 'mutter_callsite'
bzrlib/repofmt/pack_repo.py:2175: undefined name 'RepositoryFormat'

-Andrew.