selftest performance: landing our code faster and developing quicker.

Fri Aug 28 10:17:01 BST 2009

>>>>> "martin" == Martin Pool <mbp at canonical.com> writes:

    martin> Thanks for pushing this down; the speed of the tests
    martin> makes a big difference to the degree we use and
    martin> benefit from them.

Amen. 

What should be on everybody mind too, IMHO, is that the test
suite is one of our strongest assets.

We all enjoy it and wow! when it catch our mistakes early.

It's because we invest a lot of energy into making it a such
valuable tool.

But this is a an all day job that should be carried by everybody,
we can't just enjoy it, we must improve it continuously or its
usefulness will degrade quickly.

<snip/>

    martin> We should probably make sure if any compiled
    martin> extensions are missing that gives a clear warning in
    martin> the test run; it's ok to test without them but you
    martin> should know.

+1

    martin> I think the test run overall if giving a bit too much
    martin> noise: it should be basically red/green, and then
    martin> stuff about missing dependencies should be so concise
    martin> that it's blank for most developers most of the time
    martin> and therefore noticeable when it's not.

That's very subjective. On the other hand, if I understand where
Robert is going with his latest changes, most the test outputs
should be easier to customize so I won't mind having part of it
off by default as long as some option can be set in some
configuration file (-E can be used here maybe ?).

    martin> I guess it comes down to a question of what tests do
    martin> you want to have deferred or removed and when.
    martin> test_build_and_install may catch some problems.

We need more test suites and ways to select them (creating
sub-directories under bzrlib/tests sounds like an easy way to
achieve that cheaply while documenting them via the directory
name).

    martin> It would be interesting (maybe in practical) to
    martin> accumulate data on all bzr test runs in the world and
    martin> see which tests actually ever fail.

Oooooh yes....

I'm especially interested by tests that failed with uncommitted
code....

<snip/>

    martin> If we could, for example, let people write blackbox
    martin> tests in something that looks like shell doctest, but
    martin> that's actually abstracted to be much faster than
    martin> running from the whole command line down, that would
    martin> be very cool.

+1

But I consider such a tool to be targeted at people who don't
have the time to learn more about our test infrastructure, or a
way to introduce them to more focused tests.

<snip/>

    martin> Python gives fairly weak assurance that interfaces
    martin> actually match up, so I think it's relatively more
    martin> important that we do test things integrated together
    martin> rather than in isolation.

Can we stop that war even before it starts please ?

It's not one against the other, both are valuable and needed. If
we can't write both, then, well too bad, but don't let it be an
excuse for writing less tests.

One of the most important property of a test is: Defect Localization.

Do as you feel but keep that one in mind, we don't want hundreds
of failures when a bug is introduced, we want a single failure
telling us: sorry, you broke this assumption, go back to the
drawing board or we want several failures telling us, doing this
change broke these cases.

We don't want dozens or hundreds of tests all failing for the
same reason, we don't tests failing repeatedly because they are
to eager and needs several fixes to pass (I hate those ones).

Defect Localisation, the idea is not mine, it comes from "xUnit
Test Patterns", but it is certainly one that helped me the most
when it comes to defining how many tests I will write and how I
will organized them into classes.

Funnily enough, after writing the above I tried:
http://www.google.com/search?q=defect+localization

and not only did the first result is for xunitpatterns, it's in a
page presenting the goals of test automation :)

,----
| Goal: Defect Localization
| 
| Mistakes happen! Some mistakes are much more expensive to prevent
| than to fix. Suppose a bug does slip through somehow and it shows
| up in the Integration Build[SCM]. If we have made our unit tests
| fairly small by testing only a single behavior in each, we should
| be able to pinpoint the bug pretty quickly based on which test is
| failing. This is one of the big advantages of unit tests over
| customer tests. The customer tests will tell us that some
| behavior expected by the customer isn't working. The unit test
| will tell us why. We call this phenomena Defect Localization. If
| we have a failing customer test with no unit tests failing, that
| is an indication of a Missing Unit Test (see Production Bugs on
| page X).
| 
| All these benefits are wonderful but we cannot achieve them if we
| don't write tests to cover off all possible scenarios each unit
| of software needs to cover. Nor will we get the benefit if the
| tests themselves have bugs in them. Therefore it is crucial to
| keep the tests as simple as possible so that they can be easily
| seen to be correct. Writing unit tests for our unit tests is not
| a practical solution but we can and should write unit tests for
| any Test Utility Method (page X) to which we delegate complex
| algorithms needed by the test methods.
`----

<snip/>

    martin> Run on a tmpfs - last time I measured it was over a
    martin> third faster.

Confirmed, but as with '--starting-with', ---parallel=fork',
that's attacking the problem with brute force solutions :)

They are good solutions, bug they will still be valid if we
address the root causes :)

        Vincent