Release testing and the relationship between 'bzr selftest' and plugins

Vincent Ladeuil vila+bzr at
Thu Mar 15 12:57:13 UTC 2012

>>>>> Jelmer Vernooij <jelmer at> writes:


    > The API versioning infrastructure doesn't work
    > ----------------------------------------------

Sad but true.

This means the efforts we put into maintaining compatibility with
plugins are partly wasted though. 

I can think of several ways for plugins to address the issue:

1 - merge plugins into core, making them core plugins: we've talked
    about that long ago but nothing happened. The main benefit is that
    such plugins will be better maintained since a change in core will
    be noticed more quickly. This add maintenance costs to bzr core
    which is mitigated by less efforts on compatibility in both bzrlib
    and the plugins.

2 - push plugin authors to create series targeted at bzr releases: avoid
    many maintenance issues :) This will also help installer

    >  * If applicable, is the bzrlib version known to be too old
    >  * If applicable, is the bzrlib version known to be too new

Fair enough, but it's up to the plugin authors to do that.

    > Plugins tightly coupled to bzr core
    > -----------------------------------

(1) applies there (as well as 2 to a lesser degree), but again, it's up
to the plugin authors.

    > It would be also be nice to be able to blacklist certain versions
    > of plugins that we know we're breaking, when we make changes. This
    > is bug

This sounds like something we can easily add support for and will be
valuable for users.

    > As the maintainer of three foreign branch plugins, I run their
    > testsuites regularly, and usually notice when there is an
    > incompatible change in bzrlib.  I think I've done a reasonable job
    > of keeping versions available that are compatible with all
    > releases.

Thanks for that !

A related issue there is the test IDs name space, some tests can be
inherited by plugins so that 'bzr selftest -s bp.<plugin>' will include
them, some don't.

There are ways to make our tests easier to be reused by plugins but
we're not there yet:

- make the test parametrization usable by plugins: either by having it
  rely on registries (like we do for formats but the plugin tests are
  still under the bzrlib.tests hierarchy) or by providing test-specific
  registries (I tried this approach for the config stuff but the results
  are far from perfect and the tests are still under the bzrlib.tests

- design focused test classes so that plugin can inherit from them only
  for the parts they care about (this requires some expertise about the
  test framework from the plugin authors and don't really scale well
  when tests are added/moved or new test classes introduced).

- have the plugin authors maintain a set of prefixes for 'selftest -s'
  to better define the plugin test coverage (requires good TDD expertise
  and hard to maintain too).

    > "bzr selftest" doesn't pass with a standard set of plugins installed
    > --------------------------------------------------------------------

This is a known issue for years.

The root cause is a vicious circle: if tests start failing for plugins,
bzr devs tends to use 'BZR_PLUGIN_PATH=-site ./bzr selftest' (or, gosh,
even --no-plugins) which means additional failures are not seen...

Adding more plugin tests and keeping them passing is up to plugin


    > No regular testing of plugins against
    > ---------------------------------------------

    > "bzr selftest" gets run very often for "bzr" itself with just the
    > bundled plugins (launchpad, changelog_merge, ...) - on PQM,
    > various platforms in babune, on developer machines, etc.

Yup, core plugins are... core plugins :)


    > Once we fix the previous issue, I'm sure more developers will
    > start running more of the tests. Perhaps it would also be possible
    > to have a babune slave run the tests for all plugin trunks against

It's on babune's TODO list for quite a long time but doesn't make sense
until we get back to a point where all core tests are passing.

That's another vicious circle: a CI system is valuable only when 100% of
the tests are passing. As soon as you start having even a single
spurious failure, the S/N ratio goes down and there is no point adding
more tests (or rather expect much value out of the CI system, adding
tests in itself can't be bad, can it ? ;).

One way to mitigate that would be to define and maintain different test
suites that we can mix and match differently to suit our needs:

- a critical one for pqm, no exception accepted,

- a less critical one for babune: excluding known spurious failures to
  at least get to a point where babune can be rely upon

    > No 'bzr selftest' run with the bzr and plugins shipped in an installer
    > ---------------------------------------------------------------------

- a post-install targeted test suite for installer builders/packagers

>From there we could envision a job running a full test suite on babune
for a set of plugins and a job for each plugin also running the full
test suite with only this additional plugin.

This should provides a basic mean to identify the faulty plugin when the
test suite fails for the whole set.

    > Once we have a working "bzr selftest",

I'll go further based on past evidence: 'once' is not strong enough
here, 'bzr selftest' should *always* pass or we go straight into vicious

    > this should be easy to do.

Unfortunately it's not. Getting to the point where selftest pass *once*
is easy but we've always fail to keep it running without dedicated
efforts. Granted, this was always for a negligible issue each time, but
since they add up, we're always reaching a point where getting back on
track is harder than it should.

    > And we're less likely to find issues at install time if the full
    > testsuite is already being run regularly.  Of course, it will slow
    > down the release process somewhat, having to wait for the full
    > testsuite for bzr core and all plugins to pass and all.

Release time is not the right time to run heavy testing, this is
precisely what CI and time-based releases are targeting: cutting a
release should be just:

- check that tests have been passing lately,
- check that no critical issues are pending,
- tidy up the news,
- cut the tarball.

I.e. only administrative stuff, no last-minute rush for landing, no bug
fixes, no source changes :) The rationale is that any change requires
testing (which takes time) *and* can fail which delays the release. This
goes against time-based releases and as such should be avoided as much
as possible (common sense should be applied for exceptions as usual).

I'd go as far as saying that if we need to change the release process it
should be by *removing* tasks, never adding new ones.


More information about the bazaar mailing list