adding unit tests that take a long time

Fri Apr 29 03:43:53 UTC 2016

Well, now that you ask :D

On 29/04/16 12:10, Nate Finch wrote:
> I don't really understand what you mean by stages of development. 
I mean -  developing a unit of work as opposed to developing a component
as opposing to developing wiring of several components, etc. On top of
that, besides the usual development activities, you'd also need to
include bugs and regression fixes which entail slightly different
mindset and considerations than when you are writing code from scratch.
Let's say "different development activities", if it helps to clear the
mud \o/

So, you'd start developing code by yourself, then your code is
amalgamated with your team, then between teams, etc...
> At the end of the day, they all test the exact same thing - is our
> code correct?  The form of the test seems like it should be unrelated
> to when they are run.
This statement is worthy of a discussion over a drinks :)
Let's start by making a clear distinction - all tests are important to
deliver a quality product \o/ However, there are different types of testing:

unit testing;
component testing;
integration testing (including top-down, bottom-up, Big Bang,
incremental, component integration, system integration, etc);
system testing;
acceptance testing (and just for fun, let's bundle in here alpha and
beta testing);
functional testing;
non functional testing;
functionality testing;
reliability testing;
usability testing;
efficiency testing;
maintainability testing;
portability testing;
baseline testing;
compliance testing;
documentation testing;
endurance testing;
load testing (large amount of users, etc);
performance testing;
compatibility testing;
security testing;
scalability testing;
volume testing (large amounts of data);
stress testing (too many users, too much data, too little time and too
little room);
recovery testing;
regression testing....

> Can you explain why you think running tests of different sorts at the
> same time would be a bad thing?
All different types of testing that I have attempted to enumerate are
written at different times and when they are run makes a difference to
efficiency of development processes. They may live in different phase of
SDLC. Focusing on all of these types will improve product quality at the
expense of team(s) momentum as well as will affect individual
developer's habits (and other factors).

When you as a developer work on a task, the most relevant to you would be:
a. unit tests (does this little unit of work do what i want?),
b. integration (does my change work with the rest of the system?),
c. functional (does my work address requirements?).

Depending on your personal development habits, you may only want to run
either unit tests and/or integration and/or functional tests while you
work on your task. Before you add your code to common codebase, you
should make sure that your code is consistent with:
* coding guidelines (gofmt, in our case),
* agreed and recommended coding practices (like the check that you are
adding).
These checks test code for conformity ensuring that our code looks the
same and is written to the highest agreed standard.

>
> Note that I only want to "divide up tests" temporally... not
> necessarily spatially.  If we want to put all our static analysis
> tests in one directory, our integration tests in another directory,
> unit tests in the directory of the unit... that's totally fine.  I
> just want an easy way to run all the fast tests (regardless of what or
> how they test) to get a general idea of how badly I've broken juju
> during development.
I understand your desire for a quick turn around.
But I question the value that you would get from running "fast" (short)
tests - would this set include some fast running unit tests, integration
tests and functional tests? Simply because they have been identified as
running quickly on some machines? How would you know if that "fast" run
is comprehensive enough? It sounds to me like you might as well say 
"let's run couple of tests randomly" and rely on these result until you
commit...

I do not know what you will end up doing with your current dilemma. I
second Andrew's suggestion as well \o/
Developing short/long test distinctions and special processing for the
tests that we maintain seems like a waste of our effort.   
>
> On Thu, Apr 28, 2016 at 5:24 PM Anastasia Macmood
> <anastasia.macmood at canonical.com
> <mailto:anastasia.macmood at canonical.com>> wrote:
>
>     For what it's worth, to distinguish between tests based on the
>     times they take to run is borderline naive. Meaningful distinction
>     is what the test tests :D
>     Unit test checks that unit of work under testing is doing what is
>     expected;
>     integration tests tests that we play well together;
>     functional tests tests behaviour;
>     static analysis analyses codebase to ensure conformity to agreed
>     policies.
>
>     They all have meaning at different stages of development and to
>     bundle them based on the running time is to compromise these
>     stages in long-term.
>
>
>     On 29/04/16 05:03, Nate Finch wrote:
>>     Our full set of tests in github.com/juju/juu
>>     <http://github.com/juju/juu> takes 10-15 minutes to run,
>>     depending on the speed of your computer.  It's no coincidence
>>     that our test pyramid looks more like this ▽ than this △.   Also,
>>     we have a lot of tests:
>>
>>     /home/nate/src/github.com/juju/juju/$
>>     <http://github.com/juju/juju/$> grep -r ") Test" .
>>     --include="*_test.go" | wc -l
>>     9464
>>
>>     About small, medium, and large tests... I think that's a good
>>     designation.  Certainly 17 seconds is not a small test.  But I
>>     /think/ it qualifies as medium (hopefully most would be faster).
>>       Here's my suggestion, tying this back into what I was talking
>>     about originally:
>>
>>     Small tests would be those that run with go test -short.  That
>>     gives you something you can run frequently during development to
>>     give you an idea of whether or not you really screwed up. 
>>     Ideally each one should be less than 100ms to run.  (Note that
>>     even if all our tests ran this fast, it would still take 15
>>     minutes to run them, not including compilation time).
>>
>>     Medium tests would also be run if you don't use -short.  Medium
>>     tests would still be something that an average developer could
>>     run locally, and while she may want to get up to grab a drink
>>     while they're running, she probably wouldn't have time to run to
>>     the coffee shop to get said drink.  Medium tests would be
>>     anything more than 100ms, but probably less than 15-20 seconds
>>     (and hopefully not many of the latter).  Medium tests would be
>>     run before making a PR, and as a gating job.
>>
>>     Long tests should be relegated to CI, such as bringing up
>>     instances in real clouds.
>>
>>     I don't think it's terribly useful to divide tests up by type of
>>     test. Who cares if it's a bug found with static analysis or by
>>     executing the code? Either way, it's a bug.  The only thing that
>>     really matters is how long the tests take, so we can avoid
>>     running slow tests over and over.  I run go vet, go lint, and go
>>     fmt on save in my editor.  That's static analysis, but they run
>>     far more often than I actually run tests.... and that's because
>>     they're always super fast.
>>
>>     I think we all agree that all of these tests (except for CI
>>     tests) should be used to gate landings.  The question then is,
>>     how do you run the tests, and how do you divide up the tests?  To
>>     me, the only useful metric for dividing them up is how long they
>>     take to run.  I'll run any kind of test you give me so long as
>>     it's fast enough.
>>
>>     On Thu, Apr 28, 2016 at 12:39 PM Nicholas Skaggs
>>     <nicholas.skaggs at canonical.com
>>     <mailto:nicholas.skaggs at canonical.com>> wrote:
>>
>>         On 04/28/2016 10:12 AM, Katherine Cox-Buday wrote:
>>         > On 04/27/2016 09:51 PM, Nate Finch wrote:
>>         >> So, this is exactly why I didn't want to mention the
>>         nature of the
>>         >> test, because we'd get sidetracked. I'll make another
>>         thread to talk
>>         >> about that specific test.
>>         Sorry I forced you into it, but it was important to this
>>         discussion. I
>>         was wanting to understand your feelings towards a test you
>>         should be
>>         running regularly as you develop, aka a unit test, that took
>>         more than a
>>         trivial amount of time to actually execute.
>>         >>
>>         >> I do still want to talk about what we can do for unit
>>         tests that take
>>         >> a long time.  I think giving developers the option to skip
>>         long tests
>>         >> is handy - getting a reasonable amount of coverage when
>>         you're in the
>>         >> middle of the develop/test/fix cycle.  It would be really
>>         useful for
>>         >> when you're making changes that affect a lot of packages
>>         and so you
>>         >> end up having to run full tests over and over.  Of course,
>>         running
>>         >> just the short tests would not give you 100% confidence,
>>         but once
>>         >> you've fixed everything so the short tests pass, *then*
>>         you could do
>>         >> a long run for thorough coverage.
>>         >
>>         > I believe Cheryl has something like this in the works and
>>         will be
>>         > sending a note out on it soon.
>>         >
>>         Yes. It is imperative that developers can quickly (and I mean
>>         quickly or
>>         it won't happen!) run unit tests. We absolutely want testruns
>>         to be a
>>         part of the code, build, run iteration loop.
>>         >> This is a very low friction way to increase developer
>>         productivity,
>>         >> and something we can implement incrementally.  It can also
>>         lead to
>>         >> better test coverage over all.  If you write 10 unit tests
>>         that
>>         >> complete in milliseconds, but were thinking about writing
>>         a couple
>>         >> longer-running unit tests that make sure things are working
>>         >> end-to-end, you don't have the disincentive of "well, this
>>         will make
>>         >> everyone's full test runs 30 seconds longer", since you
>>         can always
>>         >> skip them with -short.
>>         >>
>>         >> The only real negative I see is that it makes it less
>>         painful to
>>         >> write long tests for no reason, which would still affect
>>         landing
>>         >> times.... but hopefully everyone is still aware of the
>>         impact of
>>         >> long-running tests, and will avoid them whenever possible.
>>         >
>>         > I will gently point out that we were prepared to land a
>>         test that
>>         > takes ~17s to run without discussion. The motivations are
>>         honest and
>>         > good, but how many others think the same? This is how our
>>         test suite
>>         > grows to be unmanageable.
>>         >
>>         > I also agree with Andrew that the nature of the test should
>>         be the
>>         > delineating factor. Right now we tend to view everything
>>         through the
>>         > lens of the Go testing suite; it's a hammer, and everything
>>         is a nail.
>>         > Moving forward, I think we should try much harder to
>>         delineate between
>>         > the different types of tests in the so-called test pyramid,
>>         > <http://martinfowler.com/bliki/TestPyramid.html> place like
>>         tests with
>>         > like tests, and then run classes of tests when and where
>>         they're most
>>         > appropriate.
>>         I advocate for slotting things into the pyramid, and making
>>         sure we are
>>         right-sized in our testing. What sort of test counts would we
>>         come up
>>         with for tests are each level? Would the base of the pyramid
>>         contain the
>>         bulk of the tests? I suspect many of the juju unit tests are
>>         really
>>         integration tests, and part of the problem that exists now
>>         with running
>>         the unit tests suite. The other thing to note is the higher
>>         you go in
>>         the pyramid, several things happen that work against making
>>         it easy for
>>         developers. The higher the test on the pyramid, the more
>>         fragile the
>>         test is (more prone to intermittent failures, breaking code),
>>         the harder
>>         it is to write, and the longer it takes to run. Those tests
>>         at the top
>>         of the pyramid will absolutely require the most investment and
>>         maintenance. This is why it's important for our testsuites to be
>>         right-sized, and for us to think carefully about what we need
>>         to test
>>         and where / how we test it.
>>
>>         To help with semantics, you can simply designate tests as
>>         small, medium
>>         and large based upon how long they take to run. Small being
>>         the bottom
>>         of the pyramid, and large being the top. No need to argue
>>         scope which
>>         can get tricky. So Nate, assuming your test in this case
>>         wasn't static
>>         analysis or code checking (which by the way I would recommend be
>>         'enforced' at the build bot level) but did require 17 seconds
>>         to run, I
>>         would be hard pressed to place it in the small category. For
>>         a codebase
>>         the size of juju, having even a small percentage of "unit"
>>         tests run
>>         that long would quickly spiral to long overall runtimes. For
>>         example,
>>         even if only 5% of say 500 tests ran for 10 seconds, a full
>>         testrun
>>         still takes over 4 minutes.
>>
>>
>>         Nicholas
>>
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20160429/e6ec22bc/attachment-0001.html>