adding unit tests that take a long time

Thu Apr 28 16:39:26 UTC 2016

On 04/28/2016 10:12 AM, Katherine Cox-Buday wrote:
> On 04/27/2016 09:51 PM, Nate Finch wrote:
>> So, this is exactly why I didn't want to mention the nature of the 
>> test, because we'd get sidetracked. I'll make another thread to talk 
>> about that specific test.
Sorry I forced you into it, but it was important to this discussion. I 
was wanting to understand your feelings towards a test you should be 
running regularly as you develop, aka a unit test, that took more than a 
trivial amount of time to actually execute.
>>
>> I do still want to talk about what we can do for unit tests that take 
>> a long time.  I think giving developers the option to skip long tests 
>> is handy - getting a reasonable amount of coverage when you're in the 
>> middle of the develop/test/fix cycle.  It would be really useful for 
>> when you're making changes that affect a lot of packages and so you 
>> end up having to run full tests over and over.  Of course, running 
>> just the short tests would not give you 100% confidence, but once 
>> you've fixed everything so the short tests pass, *then* you could do 
>> a long run for thorough coverage.
>
> I believe Cheryl has something like this in the works and will be 
> sending a note out on it soon.
>
Yes. It is imperative that developers can quickly (and I mean quickly or 
it won't happen!) run unit tests. We absolutely want testruns to be a 
part of the code, build, run iteration loop.
>> This is a very low friction way to increase developer productivity, 
>> and something we can implement incrementally.  It can also lead to 
>> better test coverage over all.  If you write 10 unit tests that 
>> complete in milliseconds, but were thinking about writing a couple 
>> longer-running unit tests that make sure things are working 
>> end-to-end, you don't have the disincentive of "well, this will make 
>> everyone's full test runs 30 seconds longer", since you can always 
>> skip them with -short.
>>
>> The only real negative I see is that it makes it less painful to 
>> write long tests for no reason, which would still affect landing 
>> times.... but hopefully everyone is still aware of the impact of 
>> long-running tests, and will avoid them whenever possible.
>
> I will gently point out that we were prepared to land a test that 
> takes ~17s to run without discussion. The motivations are honest and 
> good, but how many others think the same? This is how our test suite 
> grows to be unmanageable.
>
> I also agree with Andrew that the nature of the test should be the 
> delineating factor. Right now we tend to view everything through the 
> lens of the Go testing suite; it's a hammer, and everything is a nail. 
> Moving forward, I think we should try much harder to delineate between 
> the different types of tests in the so-called test pyramid, 
> <http://martinfowler.com/bliki/TestPyramid.html> place like tests with 
> like tests, and then run classes of tests when and where they're most 
> appropriate.
I advocate for slotting things into the pyramid, and making sure we are 
right-sized in our testing. What sort of test counts would we come up 
with for tests are each level? Would the base of the pyramid contain the 
bulk of the tests? I suspect many of the juju unit tests are really 
integration tests, and part of the problem that exists now with running 
the unit tests suite. The other thing to note is the higher you go in 
the pyramid, several things happen that work against making it easy for 
developers. The higher the test on the pyramid, the more fragile the 
test is (more prone to intermittent failures, breaking code), the harder 
it is to write, and the longer it takes to run. Those tests at the top 
of the pyramid will absolutely require the most investment and 
maintenance. This is why it's important for our testsuites to be 
right-sized, and for us to think carefully about what we need to test 
and where / how we test it.

To help with semantics, you can simply designate tests as small, medium 
and large based upon how long they take to run. Small being the bottom 
of the pyramid, and large being the top. No need to argue scope which 
can get tricky. So Nate, assuming your test in this case wasn't static 
analysis or code checking (which by the way I would recommend be 
'enforced' at the build bot level) but did require 17 seconds to run, I 
would be hard pressed to place it in the small category. For a codebase 
the size of juju, having even a small percentage of "unit" tests run 
that long would quickly spiral to long overall runtimes. For example, 
even if only 5% of say 500 tests ran for 10 seconds, a full testrun 
still takes over 4 minutes.

Nicholas