Release testing and the relationship between 'bzr selftest' and plugins

Thu Mar 15 13:56:40 UTC 2012

Hi Vincent,

Am 15/03/12 13:57, schrieb Vincent Ladeuil:
>>>>>> Jelmer Vernooij <jelmer at samba.org> writes:
> <snip/>
>
>     > The API versioning infrastructure doesn't work
>     > ----------------------------------------------
>
> Sad but true.
>
> This means the efforts we put into maintaining compatibility with
> plugins are partly wasted though. 

> I can think of several ways for plugins to address the issue:
>
> 1 - merge plugins into core, making them core plugins: we've talked
>     about that long ago but nothing happened. The main benefit is that
>     such plugins will be better maintained since a change in core will
>     be noticed more quickly. This add maintenance costs to bzr core
>     which is mitigated by less efforts on compatibility in both bzrlib
>     and the plugins.
I think this is just one of the ways we can make sure that the plugin
tests get run often during development. But it's not the only way to
make sure changes to the core are tested against the plugins.

Shipping the plugins with core also has a few issues:

 * lp:bzr  (AFAIK) falls under the contributor agreement, and all code
(with some exceptions) is (C) Canonical. Most plugins have different
copyright holders.
 * shipping plugins implies a certain level of support for them
 * plugins can have dependencies - would we start shipping the svn, apr
and mercurial sourcecode with bzr?
 * some plugins have a different landing mechanism than bzr.dev;
requiring review, for example

For some plugins (e.g. bzr-grep?) this might be a good option though,
indeed.

> 2 - push plugin authors to create series targeted at bzr releases: avoid
>     many maintenance issues :) This will also help installer
>     builders/packagers.
For most plugins, this doesn't scale with the number of release series
and the size of the plugins. It isn't worth the effort to maintain
separate release series if it's trivial to be compatible with more
versions of bzr.

For plugins that are tightly coupled with particular bzr versions, like
the foreign branch plugins, this is an option. But it still wouldn't
have prevented the problems we had with the 2.5 installers. Changes
between beta 4 and beta 5 broke the foreign branch plugins, and the
installers shipped with an outdated version of those plugins (from the
correct release series).

>     >  * If applicable, is the bzrlib version known to be too old
>     >  * If applicable, is the bzrlib version known to be too new
>
> Fair enough, but it's up to the plugin authors to do that.
True, but I think apart from working on bzr core we are involved in a
fair number of the plugins. We can also encourage changes to the way API
version checking is done, e.g. by deprecating bzrlib.api.require_any_api().

>     > Plugins tightly coupled to bzr core
>     > -----------------------------------
>
> (1) applies there (as well as 2 to a lesser degree), but again, it's up
> to the plugin authors.
>
>     > As the maintainer of three foreign branch plugins, I run their
>     > testsuites regularly, and usually notice when there is an
>     > incompatible change in bzrlib.  I think I've done a reasonable job
>     > of keeping versions available that are compatible with all
>     > releases.
>
> Thanks for that !
>
> A related issue there is the test IDs name space, some tests can be
> inherited by plugins so that 'bzr selftest -s bp.<plugin>' will include
> them, some don't.
>
> There are ways to make our tests easier to be reused by plugins but
> we're not there yet:
Is the test coverage of plugins really an issue? Speaking for the
foreign plugins, this doesn't really seem to be a problem.

bzrlib.tests.per_branch will run against all foreign branch
implementations too, or "bzr selftest
bzrlib.tests.per_branch.*SvnBranch" will run all svn branch
implementation related tests. This provides pretty good coverage.

>     > "bzr selftest" doesn't pass with a standard set of plugins installed
>     > --------------------------------------------------------------------
>
> This is a known issue for years.
>
> The root cause is a vicious circle: if tests start failing for plugins,
> bzr devs tends to use 'BZR_PLUGIN_PATH=-site ./bzr selftest' (or, gosh,
> even --no-plugins) which means additional failures are not seen...
>
> Adding more plugin tests and keeping them passing is up to plugin
> authors/maintainers...
If a lp:bzr author changes something that breaks a plugin, they should
be noticing and filing bugs. I agree plugin authors (or anybody) should
also be fixing problems in plugins when they come up, but that's a lot
easier if "bzr selftest" (without arguments) actually works.

>
> <snip/>
>
>     > No regular testing of plugins against bzr.dev
>     > ---------------------------------------------
>
>     > "bzr selftest" gets run very often for "bzr" itself with just the
>     > bundled plugins (launchpad, changelog_merge, ...) - on PQM,
>     > various platforms in babune, on developer machines, etc.
>
> Yup, core plugins are... core plugins :)
I don't think this is the magical answer. bundling plugins is just one
of the ways in which we can encourage people to always run the plugin
tests too.

We don't have the bundle the plugins to make sure that various bits of
our infrastructure run selftest with the plugins. Neither does bundling
the plugins guarantee that developers won't start disabling some plugins
that slow down their test runs.
> <snip/>
>
>     > Once we fix the previous issue, I'm sure more developers will
>     > start running more of the tests. Perhaps it would also be possible
>     > to have a babune slave run the tests for all plugin trunks against
>     > bzr.dev?
>
> It's on babune's TODO list for quite a long time but doesn't make sense
> until we get back to a point where all core tests are passing.
>
> That's another vicious circle: a CI system is valuable only when 100% of
> the tests are passing. As soon as you start having even a single
> spurious failure, the S/N ratio goes down and there is no point adding
> more tests (or rather expect much value out of the CI system, adding
> tests in itself can't be bad, can it ? ;).
>
> One way to mitigate that would be to define and maintain different test
> suites that we can mix and match differently to suit our needs:
>
> - a critical one for pqm, no exception accepted,
>
> - a less critical one for babune: excluding known spurious failures to
>   at least get to a point where babune can be rely upon
Can't we perhaps just be more pro-active about spurious failures? I
think we should either fix or disable tests (and file bugs) with
spurious failures rather than keeping them enabled and stumbling over
them constantly.

Tests that flap aren't useful for either PQM or CI, I don't think we
should treat them differently.

>
>     > Once we have a working "bzr selftest",
>
> I'll go further based on past evidence: 'once' is not strong enough
> here, 'bzr selftest' should *always* pass or we go straight into vicious
> circles.
It doesn't at the moment, we have to get to that point first. Hence the
"once". :-)
>
>     > this should be easy to do.
>
> Unfortunately it's not. Getting to the point where selftest pass *once*
> is easy but we've always fail to keep it running without dedicated
> efforts. Granted, this was always for a negligible issue each time, but
> since they add up, we're always reaching a point where getting back on
> track is harder than it should.
If we stay on top of this, it *should* be easy to do. It's not like
there are hundreds of tests suddenly breaking. If we fix regressions in
the plugins as they are introduced, it should be easy to keep up. Once
we neglect the full selftest run, it becomes a lot harder to fix it again.
>     > And we're less likely to find issues at install time if the full
>     > testsuite is already being run regularly.  Of course, it will slow
>     > down the release process somewhat, having to wait for the full
>     > testsuite for bzr core and all plugins to pass and all.
>
> Release time is not the right time to run heavy testing, this is
> precisely what CI and time-based releases are targeting: cutting a
> release should be just:
>
> - check that tests have been passing lately,
> - check that no critical issues are pending,
> - tidy up the news,
> - cut the tarball.
>
> I.e. only administrative stuff, no last-minute rush for landing, no bug
> fixes, no source changes :) The rationale is that any change requires
> testing (which takes time) *and* can fail which delays the release. This
> goes against time-based releases and as such should be avoided as much
> as possible (common sense should be applied for exceptions as usual).
>
> I'd go as far as saying that if we need to change the release process it
> should be by *removing* tasks, never adding new ones.
I'm only saying there should be a final "bzr selftest" run to verify
everything is ok, not that this is a point to find and fix all
compatibility issues. If we have proper CI and run "bzr selftest" with
plugins regularly, then this will almost certainly pass. But a last
check like this will prevent brown paper bag releases of the installers,
as we had for 2.5.0. And that costs even more RM time.

Cheers,

Jelmer

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20120315/49469421/attachment-0001.pgp>