test suite debugging

Fri Sep 29 15:01:37 BST 2006

Robert Collins wrote:
> On Thu, 2006-09-28 at 20:24 -0400, Aaron Bentley wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
> 
> 
>>> Ideally most tests will test one and only one thing. Support
>>> infrastructure to test that thing should be unobtrusive and fast. For
>>> instance, a test that WorkingTree.commit() uses commitbuilder in the
>>> expected way does not need disk resources - so we can use a MemoryTree. 
>> It comforts me to know that tests are testing more than one thing,
>> because I know how hard it is to cover all cases.  Frequently when I
>> make a mistake, the test case that catches it is one that was never
>> intended to detect that problem.
>>
>> Anyhow, that's why my gut reaction is discomfort.  I wouldn't call it my
>> considered opinion.
>>
>> And I find that tests which test only one thing lead to repeated set-up
>> and tear-down, which also makes the test suite slower.
> 
> Theres a couple of mispatterns that can occur. Tests that need a lot of
> setup and teardown usually are testing something high up the dependency
> chart, making the setup and tear down expensive. There are a couple of
> routes to address that. managed resources such as 'testresources' offers
> is one route, and stub implementations that provide the needed context
> for the test are another. Globbing the tests together leads to tests
> where the thing you want to test is stuck deep in a long series of
> events, so a bug that affects a couple of areas will often prevent the
> thing you want to test being tested - the test fails early. This makes
> it harder to debug, as you need to wade through less helpful layers to
> figure out whats wrong.

Things get a little weird when you start filtering tests, though. If you
have 1 test that does 3 things:

def test_me(self):
  setup()
  test_one()
  test_two()
  test_three()

Often 'test_one' will cause some sort of modification, which test_two
needs to expect.

Or is test_resources designed to make tests themselves dependent on
eachother.

def test_one(self):
  ##depends on setup()

def test_two(self):
  ##depends on state after test_one()

In this latter case, you still have the same 'test_one must succeed
before I can get to test_two'.

I don't guarantee that this is necessary. I'm sure one could write tests
that refactor everything down to a small bit of info.

My issue is that tests should exercise all code paths, right? Not
necessarily all permutations, but all branches. So if the code is:

def foo(x, y, z):

  if x == ?:
    do 1
  elif y == :
    do 2

  if z == :
    do 3
  else:
    raise

You should have 4 tests on that function, right? And what happens if
that is a a cmd_foo.run() function.

If you don't write the 4 tests, then it is possible you have a bug in
how you handle command line arguments.

So we could do it by refactoring this function, and stubbing out
underneath it, etc, etc. But then you aren't fully testing the system.

Anyway, I do think there are some things that can be done to get good
test coverage, without going over the same things over and over again.
And I welcome some refactoring to make that possible.

I'm a little concerned that the tests you are cleaning up are testing
more than you think they are, and when they get broken up, things will
fall through the cracks.

> 
>>> John - dumping to disk - I dont plan on that, we have a very good
>>> introspection tool in pdb. I'm happy to defer converting tests where it
>>> seems like that is important until there is a 'dump to disk after
>>> failure' option, but honestly, in the last year I've used --keep-output
>>> once.
>> I use pdb a lot, but I also use keep-output frequently.  Maybe a tenth
>> as often as pdb, but it still adds up to plenty.
> 
> Interesting. What do you use keep-output for? (What precisely do you
> look at when you are doing that?)
> 
> -Rob

I'm not super comfortable with pdb. I tend to use either --keep-output
or just tracebacks and introspection of the code. Probably I just need
more practice with it...

For me, I generally use --keep-output when a test is failing to figure
out what it actually thinks the state of the tree is (is there a file
there that it isn't finding, etc.) I probably only use it 1/month or so.

If you really want pdb to be useful, perhaps we should add a
'--spawn-pdb-on-error'. Though probably by that time all the objects are
either dead or in a hard-to-get-to frame.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060929/515ea6e2/attachment.pgp