Release testing and the relationship between 'bzr selftest' and plugins

Thu Mar 15 15:11:31 UTC 2012

>>>>> Jelmer Vernooij <jelmer at samba.org> writes:

<snip/>

    >> 1 - merge plugins into core, making them core plugins: we've talked
    >> about that long ago but nothing happened. The main benefit is that
    >> such plugins will be better maintained since a change in core will
    >> be noticed more quickly. This add maintenance costs to bzr core
    >> which is mitigated by less efforts on compatibility in both bzrlib
    >> and the plugins.

    > I think this is just one of the ways we can make sure that the plugin
    > tests get run often during development. But it's not the only way to
    > make sure changes to the core are tested against the plugins.

Indeed, I wasn't implying anything else ;)

    > Shipping the plugins with core also has a few issues:

    >  * lp:bzr  (AFAIK) falls under the contributor agreement, and all code
    > (with some exceptions) is (C) Canonical. Most plugins have different
    > copyright holders.

Hmm, this one is hard.

    >  * shipping plugins implies a certain level of support for them

Yup, the balance may not be easy to find between time spent ensuring
backward compatibility in bzr-core, compatibility with various bzr
versions in the plugin itself and packaging the right versions as
opposed to a single code base.

    >  * plugins can have dependencies - would we start shipping the svn, apr
    > and mercurial sourcecode with bzr?
    >  * some plugins have a different landing mechanism than bzr.dev;
    > requiring review, for example

I'd say soft dependencies in bzr-core and build dependencies for
packages or is should that be recommendations instead ?

    > For some plugins (e.g. bzr-grep?) this might be a good option though,
    > indeed.

Yup, I was thinking about bzr-webdav which is very stable but has been
broken with 2.5.

    >> 2 - push plugin authors to create series targeted at bzr releases: avoid
    >> many maintenance issues :) This will also help installer
    >> builders/packagers.
    > For most plugins, this doesn't scale with the number of release series
    > and the size of the plugins. It isn't worth the effort to maintain
    > separate release series if it's trivial to be compatible with more
    > versions of bzr.

Balance to be found again, some plugins may just want to tag specific
revisions for a given series if they don't evolve a lot between series.

    > For plugins that are tightly coupled with particular bzr versions,
    > like the foreign branch plugins, this is an option. But it still
    > wouldn't have prevented the problems we had with the 2.5
    > installers. Changes between beta 4 and beta 5 broke the foreign
    > branch plugins, and the installers shipped with an outdated
    > version of those plugins (from the correct release series).

Sure, but at least the packagers can subscribe to the tip of a given
branch and be done.

    >> >  * If applicable, is the bzrlib version known to be too old
    >> >  * If applicable, is the bzrlib version known to be too new
    >> 
    >> Fair enough, but it's up to the plugin authors to do that.

    > True, but I think apart from working on bzr core we are involved
    > in a fair number of the plugins. We can also encourage changes to
    > the way API version checking is done, e.g. by deprecating
    > bzrlib.api.require_any_api().

Indeed.

    >> > Plugins tightly coupled to bzr core
    >> > -----------------------------------
    >> 
    >> (1) applies there (as well as 2 to a lesser degree), but again, it's up
    >> to the plugin authors.
    >> 
    >> > As the maintainer of three foreign branch plugins, I run their
    >> > testsuites regularly, and usually notice when there is an
    >> > incompatible change in bzrlib.  I think I've done a reasonable job
    >> > of keeping versions available that are compatible with all
    >> > releases.
    >> 
    >> Thanks for that !
    >> 
    >> A related issue there is the test IDs name space, some tests can be
    >> inherited by plugins so that 'bzr selftest -s bp.<plugin>' will include
    >> them, some don't.
    >> 
    >> There are ways to make our tests easier to be reused by plugins but
    >> we're not there yet:

    > Is the test coverage of plugins really an issue? Speaking for the
    > foreign plugins, this doesn't really seem to be a problem.

Issue may be a too strong word, what I meant is that for a plugin author
there is a *big* difference between running:

  # runs only the plugin tests
  bzr selftest -s bp.<plugin>

and

  # run all tests including the plugin ones
  BZR_PLUGINS_AT=<plugin>@`pwd` bzr selftest

Ideally the former should be enough for the plugin author, it's true for
only a handful of plugins so far.

    > bzrlib.tests.per_branch will run against all foreign branch
    > implementations too, or "bzr selftest
    > bzrlib.tests.per_branch.*SvnBranch" will run all svn branch
    > implementation related tests. This provides pretty good coverage.

Yup, that's what I was thinking with a list of known prefixes to run for
a given plugin.

    >> > "bzr selftest" doesn't pass with a standard set of plugins installed
    >> > --------------------------------------------------------------------
    >> 
    >> This is a known issue for years.
    >> 
    >> The root cause is a vicious circle: if tests start failing for plugins,
    >> bzr devs tends to use 'BZR_PLUGIN_PATH=-site ./bzr selftest' (or, gosh,
    >> even --no-plugins) which means additional failures are not seen...
    >> 
    >> Adding more plugin tests and keeping them passing is up to plugin
    >> authors/maintainers...

    > If a lp:bzr author changes something that breaks a plugin, they
    > should be noticing and filing bugs.  I agree plugin authors (or
    > anybody) should also be fixing problems in plugins when they come
    > up, but that's a lot easier if "bzr selftest" (without arguments)
    > actually works.

Right. So, I did that for a long time and lose steam.

While working on a given fix, plugin test failures are disruptive...

May be I should say 'were disruptive with no easy opt-out mechanism', I
think by the time BZR_DISABLE_PLUGINS was introduced I had already fall
into the BZR_PLUGIN_PATH=-site'ly trigger-happy camp :-/

<snip/>

    >> Yup, core plugins are... core plugins :)

    > I don't think this is the magical answer. bundling plugins is just
    > one of the ways in which we can encourage people to always run the
    > plugin tests too.

Sure, there was a smiley there ;)

    > We don't have the bundle the plugins to make sure that various
    > bits of our infrastructure run selftest with the plugins. Neither
    > does bundling the plugins guarantee that developers won't start
    > disabling some plugins that slow down their test runs.

Hence we need a CI system but as mentioned, a CI system has high
requirements: failing tests should be dealt with asap before the S/N
ratio drops.

    >> <snip/>
    >> 
    >> > Once we fix the previous issue, I'm sure more developers will
    >> > start running more of the tests. Perhaps it would also be possible
    >> > to have a babune slave run the tests for all plugin trunks against
    >> > bzr.dev?
    >> 
    >> It's on babune's TODO list for quite a long time but doesn't make sense
    >> until we get back to a point where all core tests are passing.
    >> 
    >> That's another vicious circle: a CI system is valuable only when 100% of
    >> the tests are passing. As soon as you start having even a single
    >> spurious failure, the S/N ratio goes down and there is no point adding
    >> more tests (or rather expect much value out of the CI system, adding
    >> tests in itself can't be bad, can it ? ;).
    >> 
    >> One way to mitigate that would be to define and maintain different test
    >> suites that we can mix and match differently to suit our needs:
    >> 
    >> - a critical one for pqm, no exception accepted,
    >> 
    >> - a less critical one for babune: excluding known spurious failures to
    >> at least get to a point where babune can be rely upon

    > Can't we perhaps just be more pro-active about spurious failures?

As in tackling https://bugs.launchpad.net/bzr/+bugs?field.tag=babune and
https://bugs.launchpad.net/bzr/+bugs?field.tag=selftest you mean ?

    > I think we should either fix or disable tests (and file bugs) with
    > spurious failures rather than keeping them enabled and stumbling
    > over them constantly.

    > Tests that flap aren't useful for either PQM or CI, I don't think we
    > should treat them differently.

Right, we had enough of them to decorate them may be ? I did exclude
tests on babune at one point but this is not a good solution as I forgot
about them at one point so we need some in-core tracking to get a better
visibility.

Probably something along the lines of re-trying once and warns if it
fail twice but don't let selftest itself fail and emit a final summary
mentioning the number of such spurious failures.

    >> 
    >> > Once we have a working "bzr selftest",
    >> 
    >> I'll go further based on past evidence: 'once' is not strong enough
    >> here, 'bzr selftest' should *always* pass or we go straight into vicious
    >> circles.
    > It doesn't at the moment, we have to get to that point first. Hence the
    > "once". :-)

Hehe, yeah, what I meant is that we said 'once' several times in the
past, I think we should change *something* if we want to get out of this
habit ;)

    >> 
    >> > this should be easy to do.
    >> 
    >> Unfortunately it's not. Getting to the point where selftest pass *once*
    >> is easy but we've always fail to keep it running without dedicated
    >> efforts. Granted, this was always for a negligible issue each time, but
    >> since they add up, we're always reaching a point where getting back on
    >> track is harder than it should.
    > If we stay on top of this, it *should* be easy to do. It's not like
    > there are hundreds of tests suddenly breaking. If we fix regressions in
    > the plugins as they are introduced, it should be easy to keep up. Once
    > we neglect the full selftest run, it becomes a lot harder to fix it again.

Exactly.

    >> > And we're less likely to find issues at install time if the full
    >> > testsuite is already being run regularly.  Of course, it will slow
    >> > down the release process somewhat, having to wait for the full
    >> > testsuite for bzr core and all plugins to pass and all.
    >> 
    >> Release time is not the right time to run heavy testing, this is
    >> precisely what CI and time-based releases are targeting: cutting a
    >> release should be just:
    >> 
    >> - check that tests have been passing lately,
    >> - check that no critical issues are pending,
    >> - tidy up the news,
    >> - cut the tarball.
    >> 
    >> I.e. only administrative stuff, no last-minute rush for landing, no bug
    >> fixes, no source changes :) The rationale is that any change requires
    >> testing (which takes time) *and* can fail which delays the release. This
    >> goes against time-based releases and as such should be avoided as much
    >> as possible (common sense should be applied for exceptions as usual).
    >> 
    >> I'd go as far as saying that if we need to change the release process it
    >> should be by *removing* tasks, never adding new ones.
    > I'm only saying there should be a final "bzr selftest" run to verify
    > everything is ok, not that this is a point to find and fix all
    > compatibility issues. If we have proper CI and run "bzr selftest" with
    > plugins regularly, then this will almost certainly pass. But a last
    > check like this will prevent brown paper bag releases of the installers,
    > as we had for 2.5.0. And that costs even more RM time.

Indeed.

So, it that wasn't clear, let me re-iterate: I'm in full agreement
about:

- spending more time on ensuring that the full test suite is always
  passing,

- tweaking the 'full test suite' definition so it matches what we really
  care about (this means tagging spurious failures in a way that ensure
  that they are addressed, adding whatever plugins we think are worth
  the maintenance effort and <other ideas>)

I think we agree far more than we disagree on most of the topics so
let's address the ones we agree on ;)

      Vincent