Wrote a small black-box test suite for bazaar

Thu Aug 28 13:33:18 BST 2008

Hi John,

John Arbash Meinel <john <at> arbash-meinel.com> writes:

> Try instead:
> 
> python -c 'from bzrlib import branch, repository; import time; print
> time.asctime()'
> 
> The overhead for spawning bzr is generally "import bzrlib" time, because
> python is actually quite slow at re-importing all of the data structures into
> memory.
> 
> "time bzr rocks" which is a mostly "no-op" command is about 100ms on this
> machine. Which is about 10 per second.
> 

I see. That does seem to be an amazingly slow import, even "import gtk" is about
twice as fast on my machine and I thought that was the slowest import around :)

See my reply to Robert for further discussion of this.

> ...
> > 
> >  <pasted from original mail - did you not see this or not understand it?>
> > 
> > The main points:
> > 1) You don't need to know the code to write tests (I've never looked at the
> > code)
> 
> Well you did at least enough to compare to the blackbox tests.

Well yes, if you're going to be pedantic :) I located the tests, verified that
there were some blackbox ones and that many blackbox ones seemed to be the same
format. This is very different from looking at it enough to be able to write
tests throught the current test harness.

> 
> > 2) Tests don't depend on the structure of the code and hence don't need
> > changing when the code is refactored.
> 
> With backwards compatibility, this is actually quite rare.

Do you think it possible that backwards compatibility with the current tests is
playing a part here? The flipside of (2) is that highly code-structure dependent
tests have a tendency to make refactoring well-nigh impossible after a while
because the cost of changing them all simply becomes too high. I can't speak for
Bazaar as I have no insight into the code, but I've certainly ended up in this
situation a couple of times and it's made me very skeptical of large-scale white
box testing.

> 
> > 3) There are already quite a few blackbox tests that look like
> > 
> > def test_update_standalone_trivial(self):
> >         self.make_branch_and_tree('.')
> >         out, err = self.run_bzr('update')
> >         self.assertEqual('Tree is up to date at revision 0.\n', err)
> >         self.assertEqual('', out)
> > 
> > This is basically a way to write tests of that form without writing any
> > code.
> > 
> > Regards,
> > Geoff Bache
> > 
> 
> You are welcome to do this, and if you find it useful and worthwhile, we'll
> probably take a look at it. I would expect it to end up being quite slow in
> the long run.

Yes, I would at least expect it to be slower than the current set up. I would
suspect more like 10% than 100%. (See reply to Robert for further discussion
around parallelism)

I don't have any plans to become a bazaar developer myself so it's not really a
question of whether I find it useful and worthwhile :) What would you need it to
do that it doesn't do today before you'd be prepared to try it out? I'm happy to
spend an hour writing seven tests that nobody ever runs, but I don't really want
to write 800 tests that nobody ever runs :)

> 
> bzr selftest -s bzrlib.tests.blackbox --list
> 
> shows 844 blackbox tests. If each one of those has a single spawn, you are
> adding at least 8s to the test in just spawn overhead. But as you are also
> planning on creating new branches and working trees, populating them,
> committing the changes, etc. I wouldn't be surprised if there was 10 spawns
> per test. Or around 80s added. Right now running all blackbox tests takes
> about 80s. So your proposal is likely to at least double the time it takes.

Each test would be a single bzr operation on a predefined isolated set up. So
there would be one spawn per test (unless bzr spawns internally, but then your
current tests would have this problem too surely?). I don't believe in tests
that are long sequences of operations that depend on each other, they're too
difficult to understand and debug (as you hint below). 

> In the end, we started off with a test suite which spawned bzr for everything.
> Init/add/commit/etc. And getting rid of that saved a *lot* of time. It also
> simplifies the tests, so that they can focus on one aspect, rather than having
> lots of combined effects. Which is a good sign for unit tests.

I'd say we're more discussing system/acceptance tests than unit tests here. Then
the natural granularity is that of the interface, i.e. the individual commands.
I'd also say the approach of not having any test code simplifies things a great
deal, and you can still focus on one aspect at a time unless you're explicitly
testing combinations.

Regards,
Geoff