Release testing and the relationship between 'bzr selftest' and plugins
vila+bzr at canonical.com
Thu Mar 15 15:11:31 UTC 2012
>>>>> Jelmer Vernooij <jelmer at samba.org> writes:
>> 1 - merge plugins into core, making them core plugins: we've talked
>> about that long ago but nothing happened. The main benefit is that
>> such plugins will be better maintained since a change in core will
>> be noticed more quickly. This add maintenance costs to bzr core
>> which is mitigated by less efforts on compatibility in both bzrlib
>> and the plugins.
> I think this is just one of the ways we can make sure that the plugin
> tests get run often during development. But it's not the only way to
> make sure changes to the core are tested against the plugins.
Indeed, I wasn't implying anything else ;)
> Shipping the plugins with core also has a few issues:
> * lp:bzr (AFAIK) falls under the contributor agreement, and all code
> (with some exceptions) is (C) Canonical. Most plugins have different
> copyright holders.
Hmm, this one is hard.
> * shipping plugins implies a certain level of support for them
Yup, the balance may not be easy to find between time spent ensuring
backward compatibility in bzr-core, compatibility with various bzr
versions in the plugin itself and packaging the right versions as
opposed to a single code base.
> * plugins can have dependencies - would we start shipping the svn, apr
> and mercurial sourcecode with bzr?
> * some plugins have a different landing mechanism than bzr.dev;
> requiring review, for example
I'd say soft dependencies in bzr-core and build dependencies for
packages or is should that be recommendations instead ?
> For some plugins (e.g. bzr-grep?) this might be a good option though,
Yup, I was thinking about bzr-webdav which is very stable but has been
broken with 2.5.
>> 2 - push plugin authors to create series targeted at bzr releases: avoid
>> many maintenance issues :) This will also help installer
> For most plugins, this doesn't scale with the number of release series
> and the size of the plugins. It isn't worth the effort to maintain
> separate release series if it's trivial to be compatible with more
> versions of bzr.
Balance to be found again, some plugins may just want to tag specific
revisions for a given series if they don't evolve a lot between series.
> For plugins that are tightly coupled with particular bzr versions,
> like the foreign branch plugins, this is an option. But it still
> wouldn't have prevented the problems we had with the 2.5
> installers. Changes between beta 4 and beta 5 broke the foreign
> branch plugins, and the installers shipped with an outdated
> version of those plugins (from the correct release series).
Sure, but at least the packagers can subscribe to the tip of a given
branch and be done.
>> > * If applicable, is the bzrlib version known to be too old
>> > * If applicable, is the bzrlib version known to be too new
>> Fair enough, but it's up to the plugin authors to do that.
> True, but I think apart from working on bzr core we are involved
> in a fair number of the plugins. We can also encourage changes to
> the way API version checking is done, e.g. by deprecating
>> > Plugins tightly coupled to bzr core
>> > -----------------------------------
>> (1) applies there (as well as 2 to a lesser degree), but again, it's up
>> to the plugin authors.
>> > As the maintainer of three foreign branch plugins, I run their
>> > testsuites regularly, and usually notice when there is an
>> > incompatible change in bzrlib. I think I've done a reasonable job
>> > of keeping versions available that are compatible with all
>> > releases.
>> Thanks for that !
>> A related issue there is the test IDs name space, some tests can be
>> inherited by plugins so that 'bzr selftest -s bp.<plugin>' will include
>> them, some don't.
>> There are ways to make our tests easier to be reused by plugins but
>> we're not there yet:
> Is the test coverage of plugins really an issue? Speaking for the
> foreign plugins, this doesn't really seem to be a problem.
Issue may be a too strong word, what I meant is that for a plugin author
there is a *big* difference between running:
# runs only the plugin tests
bzr selftest -s bp.<plugin>
# run all tests including the plugin ones
BZR_PLUGINS_AT=<plugin>@`pwd` bzr selftest
Ideally the former should be enough for the plugin author, it's true for
only a handful of plugins so far.
> bzrlib.tests.per_branch will run against all foreign branch
> implementations too, or "bzr selftest
> bzrlib.tests.per_branch.*SvnBranch" will run all svn branch
> implementation related tests. This provides pretty good coverage.
Yup, that's what I was thinking with a list of known prefixes to run for
a given plugin.
>> > "bzr selftest" doesn't pass with a standard set of plugins installed
>> > --------------------------------------------------------------------
>> This is a known issue for years.
>> The root cause is a vicious circle: if tests start failing for plugins,
>> bzr devs tends to use 'BZR_PLUGIN_PATH=-site ./bzr selftest' (or, gosh,
>> even --no-plugins) which means additional failures are not seen...
>> Adding more plugin tests and keeping them passing is up to plugin
> If a lp:bzr author changes something that breaks a plugin, they
> should be noticing and filing bugs. I agree plugin authors (or
> anybody) should also be fixing problems in plugins when they come
> up, but that's a lot easier if "bzr selftest" (without arguments)
> actually works.
Right. So, I did that for a long time and lose steam.
While working on a given fix, plugin test failures are disruptive...
May be I should say 'were disruptive with no easy opt-out mechanism', I
think by the time BZR_DISABLE_PLUGINS was introduced I had already fall
into the BZR_PLUGIN_PATH=-site'ly trigger-happy camp :-/
>> Yup, core plugins are... core plugins :)
> I don't think this is the magical answer. bundling plugins is just
> one of the ways in which we can encourage people to always run the
> plugin tests too.
Sure, there was a smiley there ;)
> We don't have the bundle the plugins to make sure that various
> bits of our infrastructure run selftest with the plugins. Neither
> does bundling the plugins guarantee that developers won't start
> disabling some plugins that slow down their test runs.
Hence we need a CI system but as mentioned, a CI system has high
requirements: failing tests should be dealt with asap before the S/N
>> > Once we fix the previous issue, I'm sure more developers will
>> > start running more of the tests. Perhaps it would also be possible
>> > to have a babune slave run the tests for all plugin trunks against
>> > bzr.dev?
>> It's on babune's TODO list for quite a long time but doesn't make sense
>> until we get back to a point where all core tests are passing.
>> That's another vicious circle: a CI system is valuable only when 100% of
>> the tests are passing. As soon as you start having even a single
>> spurious failure, the S/N ratio goes down and there is no point adding
>> more tests (or rather expect much value out of the CI system, adding
>> tests in itself can't be bad, can it ? ;).
>> One way to mitigate that would be to define and maintain different test
>> suites that we can mix and match differently to suit our needs:
>> - a critical one for pqm, no exception accepted,
>> - a less critical one for babune: excluding known spurious failures to
>> at least get to a point where babune can be rely upon
> Can't we perhaps just be more pro-active about spurious failures?
As in tackling https://bugs.launchpad.net/bzr/+bugs?field.tag=babune and
https://bugs.launchpad.net/bzr/+bugs?field.tag=selftest you mean ?
> I think we should either fix or disable tests (and file bugs) with
> spurious failures rather than keeping them enabled and stumbling
> over them constantly.
> Tests that flap aren't useful for either PQM or CI, I don't think we
> should treat them differently.
Right, we had enough of them to decorate them may be ? I did exclude
tests on babune at one point but this is not a good solution as I forgot
about them at one point so we need some in-core tracking to get a better
Probably something along the lines of re-trying once and warns if it
fail twice but don't let selftest itself fail and emit a final summary
mentioning the number of such spurious failures.
>> > Once we have a working "bzr selftest",
>> I'll go further based on past evidence: 'once' is not strong enough
>> here, 'bzr selftest' should *always* pass or we go straight into vicious
> It doesn't at the moment, we have to get to that point first. Hence the
> "once". :-)
Hehe, yeah, what I meant is that we said 'once' several times in the
past, I think we should change *something* if we want to get out of this
>> > this should be easy to do.
>> Unfortunately it's not. Getting to the point where selftest pass *once*
>> is easy but we've always fail to keep it running without dedicated
>> efforts. Granted, this was always for a negligible issue each time, but
>> since they add up, we're always reaching a point where getting back on
>> track is harder than it should.
> If we stay on top of this, it *should* be easy to do. It's not like
> there are hundreds of tests suddenly breaking. If we fix regressions in
> the plugins as they are introduced, it should be easy to keep up. Once
> we neglect the full selftest run, it becomes a lot harder to fix it again.
>> > And we're less likely to find issues at install time if the full
>> > testsuite is already being run regularly. Of course, it will slow
>> > down the release process somewhat, having to wait for the full
>> > testsuite for bzr core and all plugins to pass and all.
>> Release time is not the right time to run heavy testing, this is
>> precisely what CI and time-based releases are targeting: cutting a
>> release should be just:
>> - check that tests have been passing lately,
>> - check that no critical issues are pending,
>> - tidy up the news,
>> - cut the tarball.
>> I.e. only administrative stuff, no last-minute rush for landing, no bug
>> fixes, no source changes :) The rationale is that any change requires
>> testing (which takes time) *and* can fail which delays the release. This
>> goes against time-based releases and as such should be avoided as much
>> as possible (common sense should be applied for exceptions as usual).
>> I'd go as far as saying that if we need to change the release process it
>> should be by *removing* tasks, never adding new ones.
> I'm only saying there should be a final "bzr selftest" run to verify
> everything is ok, not that this is a point to find and fix all
> compatibility issues. If we have proper CI and run "bzr selftest" with
> plugins regularly, then this will almost certainly pass. But a last
> check like this will prevent brown paper bag releases of the installers,
> as we had for 2.5.0. And that costs even more RM time.
So, it that wasn't clear, let me re-iterate: I'm in full agreement
- spending more time on ensuring that the full test suite is always
- tweaking the 'full test suite' definition so it matches what we really
care about (this means tagging spurious failures in a way that ensure
that they are addressed, adding whatever plugins we think are worth
the maintenance effort and <other ideas>)
I think we agree far more than we disagree on most of the topics so
let's address the ones we agree on ;)
More information about the bazaar