[rfc] avoid assert statements?

Wed Sep 19 02:19:54 BST 2007

The question of when it's ok to use 'assert' has come up in a few
reviews recently; Robert suggested we should set a standard for it.  I
wouldn't say it's hugely urgent but thought I'd write this down rather
than forget it.

As you probably know, assert is equivalent to

  if __debug__:
    if not CONDITION:
      raise AssertionError(MESSAGE)

where __debug__ is controlled by the Python -O flag, which is normally absent.

At present there are 551 uses of it in bzrlib.   For every case where
assert is used or might be used we have a choice of using assert,
using an explicit if/raise block or having no check at all.  Most of
them are checking parameters, and most of them are never hit so are in
a sense wasted effort.  But most of them are just 'x is not None' and
so reasonably cheap.

In many cases the assertion could be done implicitly, by for example
just trying to use an object which might be none, or adding an 'else
raise' to an if  block (if the exception is needed.)

In some C projects asserts are normally off, or off on production
builds, which gives people the idea that you can do arbitrary
expensive code there.  We can't rely on __debug__ being off in Python
so it's not a good idea to do expensive things.

assertions should not have side effects because they are not always
run; we have had a small number of bugs in the past where this was not
true.  We now run the test suite with -O on pqm to try to catch this.

There are a few different cases where someone might want to use it:

* Checking data coming in from the user or environment.  If the check
is worth doing it's worth doing always, and worth giving a reasonable
message not a traceback.

* Checking our own work in 'crucial' situations, where making a
mistake might have bad consequences.  Again, if you're going to do it
we should  probably do it always.

* Checking arguments to internal methods.  These probably normally
fall under the prohibition against 'look before you leap',
particularly when the check is expensive like checking every line of a
file parameter.  It's true that sometimes the exception will give a
more obvious error than just allowing things to fail, but it's a shame
to always execute the check when it's only going to help in the very
few cases where someone's debugging it.  Many calls are of this type.
Perhaps as a rule of thumb we can say that if passing a wrong
parameter will cause a different error 'reasonably soon' (rather than
messing up a long-lived object) then it should be left out.

* Checks that perform the validation of a test case: we're trying to
provoke some code into failing by passing the wrong data to another
object, which will detect it and raise an assertion error.  Kind of a
special case of the previous one.  Probably better to do the check
only when it's needed by intercepting the call to the other class.

* Checks where we're not really sure what could go wrong but it looks
bad. :-)  I guess people put these in and then wait to see if anything
hits it.  Not sure what to do in general, but maybe think harder about
how or why this would go wrong.

* Checks added while developing a module as scaffolding.  Maybe they
should be removed/commented out when you're done?

To sum up: if we think it's the input data that's at fault, we should
do the check always and explicitly.  If we're looking for a code bug
we should eliminate this in the test suite instead.  I think we should
be a bit suspicious of seeing them in merge requests.

-- 
Martin