Charm Testing Spec Proposal

Clint Byrum clint at
Wed Feb 1 21:11:33 UTC 2012

Excerpts from Kapil Thangavelu's message of Wed Feb 01 12:03:13 -0800 2012:
> Excerpts from Clint Byrum's message of Wed Jan 25 20:34:31 -0500 2012:
> > I don't know how to use the other tool that juju has been using for
> > reviews, so if somebody wants to train me on that, thats fine. Otherwise,
> > here is a plain old launchpad merge proposal:
> > 
> >
> > 
> > Its also being generated into HTML here:
> > 
> >
> > 
> > Comments, suggestions, and criticisms are more than welcome.
> > 
> Hi Clint,
> Thanks to both you and mark for taking up charm testing.
> The current ftests lp:juju/ftests are basically shell scripts that correspond  
> to what's being specified here for a test case, albeit they don't bother with 
> teardown or charm retrieval abstractions.
> I think given this spec, we could just incorporate those tests directly into the 
> example charms for functional test runs.

+1, that would be a good thing to do once the spec is implemented so
that the examples have example tests.

> Comments on the spec.
> Re Generic Tests
> There's an additional class of automated tests which would allow automated tests 
> to verifying charm functionality. If we take any given charm, and establish its 
> dependencies and its clients (reverse deps), we can assemble a series of 
> environments where in the charm, its minimal dependencies, and their relations 
> are established, with test iteration across its possible clients and their 
> relations. The same logic of watching the state for an inactive time 
> period(aka steady state) would allow for some basic sanity verification.
> Just to get a better sense on what the graph might look like I tossed together 
> this dot rendering of the entire charm universe.

Thats really cool! 

However, if I understand the problem correctly, each arrow is basically
another exponential jump for each isolated graph.  So, testing haproxy
would mean deploying with every app that provides an http interface,
which is a lot.

I'm not sure of the value of such an exercise. What you really want to
know is whether or not the app behind haproxy gets incorporated into
haproxy's configuration, which is fully tested by just testing with one
well behaved charm on the other side.

For more complex examples, like mysql, where queries are run during the
config exchange, its still pretty much fully exercised by one well behaved

This does make me think of one useful bit that we can do, which is
calculating this graph and making sure that all required relations
are satisfiable.

> Re Charm specific tests.
> This looks good to me. I think the teardown at least wrt to the environment can 
> be automated, the charm tests just need to clean out any local state. A 
> useful automation for the tests would be running a verification script directly 
> on a given unit, rather than remotely poking it from the testrunner.

I want to be able to use a single environment and not destroy it with
every test run. I'd also like to be able to re-use machines, though
without any chroot/lxc support that seems like folly right now. The
test runner is still going to be responsible for cleaning up after any
charms that leave services lying around.

I've added a blurb that says that the test runner may clean up services
left behind and that tests *should* clean them up by themselves and
extract any needed artifacts from the units before the test exits.

> Is it intended that the tests run as a non root user inside of a container or 
> just directly on the host. 

The user they're running as is undefined, and no assumption is
allowed there. Only that juju is in the path and that there may be
some restrictions.

> Re Output
> It might be outside of the scope, but capturing the unit log files on failure 
> would be helpful for debugging against automated test runs. 

Great idea. I think I'll leave it up to the test runner implementation,
but just running debug-log into a file during the test seems like a
simple way to achieve this.

> One additional concern is that this piece.
> """
> There's a special sub-command of juju, ``deploy-previous``, will deploy the
> last successfully tested charm instead of the one from the current
>  delta. This will allow testing upgrade-charm.
> """
> implies some additional infrastructure, at least a test database recording test 
> runs against versions.

Thats really an implementation detail, and I have a fairly simple idea
on how to do that.

Basically there will be two repos:


Setting up the test runner will involve checking everything out into both
/testing_charms and /successful_charms. When we check for delta, we put
any new changes into testing_charms, and start the tests. As they pass,
we push these deltas into successful_charms.

So yes there may be times where you are upgrading from the same version
you are upgrading to. Thats ok, the point is to catch the instance where
you changed the charm, and broke upgrades.

> It exposes a larger question that's largely unanswered here, namely that charms 
> are typically deployed and tested against a charm graph, with each charm 
> versioned independently. Succesful runs are against a versioned graph. To 
> maintain a goal of being able to identify which new charm revision breaks a 
> subset of the graph, requires gating/ordering of the charm change processing 
> per the time of the changes across charms. Else things like deploy-previous may 
> not work because of other changes in the charm graph.

I do think we can probably apply an optimization, once this is up and
running, to check the graph, which is clearly "knowable" against the
delta, and only run tests for charms which are in the graph with charms
that are in the delta.

However, trying to test each change with each graph is going to burn
up a lot of CPU, network, and disk time. Perhaps invest in some energy
stocks and AMZN before working on that implementation. ;)

I do think there will be enough information in the output, with
charm revnos and bzr branch revnos, to recreate the state, which is a
requirement set out in the spec.

I think we should implement this basic, simple algorithm (test everything
daily with the full delta set), and then iterate on it as we learn what
breaks charms and how the debugging process goes. If we try to think
through all the possibilities before we start, this is just never going
to happen.

More information about the Juju mailing list