[MERGE] Speedup selftest by loading a previously saved test list

Thu Jan 17 16:59:51 GMT 2008

>>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:

    robert> On Wed, 2008-01-16 at 15:55 +0100, Vincent Ladeuil wrote:
    >> >>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:
    >> 
    robert> On Fri, 2008-01-11 at 00:16 +0100, Vincent Ladeuil wrote:
    >> >> This patch add two options to the selftest command:
    >> >> 
    >> >> --save-list <file> 
    >> >> --load-list <file>
    >> >> 
    >> >> You use the first one to define a set of tests you want to run
    >> >> repeatedly and use the second one to avoid loading the whole test
    >> >> suite:
    robert> ...
    >> >> - test inheriting from a class defined inside a function (or
    >> >> using a lambda as an attribute) can't be pickled either (but
    >> >> see the patch for some obvious fixes in the bzr core tests),
    >> >> 
    >> >> But overall I think more than 99% of tests can be used with this
    >> >> mechanism.
    >> 
    robert> I feel pretty uncomfortable about this.
    >> 
    >> What I've tested so far: 99.76% of our core test suite are
    >> serialized reliably (10530 out of 10561 tests) see below.
    >> 
    robert> You have to assume a great deal to be sure that tests
    robert> will execute the correct test when you are pickling
    robert> their state.
    >> 
    >> Serialized before setUp is run. I assume that each test is
    >> initialized by setUp and setUp only regarding its state.

    robert> Depending on what you mean by initialized this can be
    robert> very false

AIUI, you and Andrew consider that pickle is not the right way to
address the problem.

You know what ? I agree :)

I chose pickle because it allowed *a* solution to the problem. I
don't consider that approach to be perfect. I never have. I had
even some big doubts about it at first, but I gave it a try.

It first failed. But with a few tweaks it worked for nearly the
whole test suite and only some tests cannot be handled but with a
decent way to detect them.

And all the tweaks raised interesting points (the lambda were
useless, the DocTest showed that pickle was a dead end for them,
the BzrDirFormat (well, see below), etc).

    robert> in some tests using the pyunit test protocol, which
    robert> we try to adhere to for interoperability with other
    robert> python libraries (so that bzrlib tests are friendly
    robert> to other runners (when bzrlib is embedded), and vice
    robert> verca (so that plugins that use features from other
    robert> projects can mix in their test classes safely)).

Alright, alright, that still doesn't show me a failing test, but
I can easily imagine that there are some outside of the bzr test
suite.

<BzrDirFormat rough (rude ?) patch snipped/>

I'd rather not focus on that since I take inspiration from
RemoteBzrDirFormat and BzrDirMetaFormat1 to define it. The point
was to make the tests pass instead of showing a selftest
execution ending with a big 'FAILED' :) There are other ways to
address the problem, this one was the simplest (but I don't
pretend it is correct).

So I let someone more knowledgeable than me in that area
understand why some formats define an __eq__ while some others
don't.

For those interested, the failing test is
TestBzrDir.test_format_initialize_find_open in
tests/bzrdir_implementations/test_bzrdir.py

This test is parametrized with (among others) a bzrdir_format
*object* into 5 instances 2 of them failing when reloaded
(roughly because we use a registry which creates objects at
import time, said objects when deserialized becoming different
ones obviously).

- BzrDirFormat4 skips the test by returning early (TestNotApplicable ?),
- BzrDirFormat5: fails
- BzrDirFormat6: fails
- BzrDirMetaFormat1: succeeds because it defines __eq__ 
- RemoteBzrDirFormat: succeeds because it defines __eq__ 

<snip/>

    robert> Its undeniably faster. But there is a raft of
    robert> complications not address at all by your patch, and
    robert> which addressing will add complexity that is at best
    robert> confusing to the test environment.

Ok. Let's build on what we agree upon then. I tried to isolate
pickle related problems at save time, effective but far from
sufficient. Ok.

    robert> I'll change my vote on a discussion about
    robert> /correctness and simplicity for developers/ here, not
    robert> about speed. 

I get that. The main point of that patch was to evaluate if the
speed gains were worth it. If we agree on that, let's find
another way to address the parameter handling (parameter-less
tests should be easy enough to create from name only).

But let's keep in mind that a saved test list can become obsolete
pretty quickly and is not aimed at replacing using selftest the
"normal" way. So I don't want to spend too much effort to be 100%
correct and robust.

    robert> pickling and state skew is an insidious problem when
    robert> you are dealing with carefully managed data, let
    robert> alone tests.

Full agreement here too, it just happened that the tests were
easier to pickle than I expected so I wanted to share the fun ;-)

Let's go back to the problem.

TestCase doc strings says:

,----
|     If it is necessary to override the __init__ method, the
|     base class __init__ method must always be called. It is
|     important that subclasses should not change the signature
|     of their __init__ method, since instances of the classes
|     are instantiated automatically by parts of the framework in
|     order to be run.
`----

That means we can't use __init__ to provide parameters.

,----
|     Test authors should subclass TestCase for their own
|     tests. Construction and deconstruction of the test's
|     environment ('fixture') can be implemented by overriding
|     the 'setUp' and 'tearDown' methods respectively.
`----

For test parametrization, what we do is inject the parameters as
TestCase attributes after __init__ but before setUp.

I respected that when reloading the tests and I think the
approach is sound. I'd felt uncomfortable modifying setUp with
additional parameters.

I also feel uncomfortable in adding parameters to setUp itself.

I think the aim is to define a specific loading so we should keep
loading and running the test suite separate. setUp is part of
running the test suite so it should not be modified.

>>>>> "Andrew" == Andrew Bennetts <andrew at canonical.com> writes:

<snip/>

    Andrew> If there is a benefit to persisting the results of
    Andrew> test discovery, I think it'd be better to be
    Andrew> persisting the names rather than the instances.

Sure. Let's find a better than pickle way to do that then.

    Andrew> If we can get to the situation where given a test
    Andrew> name, we can reconstruct it just as cheaply as
    Andrew> constructing the TestCase class directly, that would
    Andrew> be ideal.  And at least as fast as unpickling, if not
    Andrew> faster, and definitely much more robust.

No problem as long as no parameter are involved.

    Andrew> Perhaps what we should do is change
    Andrew> bzrlib.tests.TestCase's id to be easily parseable
    Andrew> into four parts: test module name, test case name,
    Andrew> test method name, parameterisation pieces.

Yup. 'id' is a good starting point.

unitests.TestCase:
    def id(self):
        return "%s.%s" % (_strclass(self.__class__), self._testMethodName)

def _strclass(cls):
    return "%s.%s" % (cls.__module__, cls.__name__)

    Andrew> I suppose we could already guess this from
    Andrew> “bzrlib.tests.test_foo.FooTests.test_bar(QuxFormat)”
    Andrew> pretty reliably if we wanted.  We could then write a
    Andrew> loader that takes a list of such names and constructs
    Andrew> just those tests, without loading unnecessary modules
    Andrew> and without constructing many unnecessary instances.

Yes a simple:

def load_test(name):
    mod, klass, meth = smartsplit(name)
    if not loaded(mod):
       __import__ mod

    return klass(meth)

    Andrew> (I'd expect the parameterisation + module-level
    Andrew> test_suite methods to be the biggest problem here,

I avoided that, and I think we still can avoid it by ignoring the
test suite 'shape', simplifying it as a test list and handle the
parameters for each test. At least nothing in my experiments went
against that choice.

    Andrew> but I think for a first cut simply building the
    Andrew> entire test_suite then filtering would be ok.

You mean the module test suite here I assume.

    Andrew> It only takes 500ms on my laptop to construct the
    Andrew> entire repository_implementations test_suite(), which
    Andrew> has 2090 TestCases.)

I have some doubts about that approach because I suspect it will
be more intrusive that adding capabilities (or visibility on
parameters) directly to the TestCase classes (or the associated
TestAdapter).

The problem is to find the right balance between keeping the
parametrization sufficiently opened while still being able to
build the TestCase from the parameters. pickle allowed me to
postpone these choices while implementing the rest.

I think a promising approach is to save the test name, its
parameters and the associated adapter class name.

Then the adapter should be modified to always create the tests
from the parameter textual representations (as used in the
id).

The impact on test writers should be kept minimal that way and
give a better tracking on how one should parametrize tests and
how to find back how a failing test have been parametrized.

The needed data furiously looks like those available in the
self.scenarios attribute of the tests.TestScenarioApplier object
with the additional constraint that the first element of the
tuple should be unique, but I suspect it's already the case.

I will give that a try.

              Vincent