Releasing Alphas and Betas without "freezing"

Wed Jun 20 20:55:00 UTC 2012

On Wed, Jun 20, 2012 at 04:57:19AM +0000, Adam Conrad wrote:
> On Tue, Jun 19, 2012 at 11:06:14AM -0700, Michael Casadevall wrote:
> > Milestones exist to give the Ubuntu developer community to step back,
> > and check to make sure nothing important has broken, and to gauge our
> > progress through a cycle. In addition, they provide a dedicated time
> > where as a community we step forth and check our images to ensure no
> > regressions have slipped by.
> 
> I don't think anyone is arguing that we should do less manual testing.
> In fact, I think that milestones create a culture of less testing, in
> the sense that people ONLY test during milestones.
>
> I think any discussion of "dropping milestones" can only come paired
> with a conversation about better continuous testing practices.  If
> people have the time to test, they should test what's current.  If
> people don't have the time to test, milestones don't magically create
> time, in fact, they drain time from people who have to make them go.

I think if continuous testing is done in a well organized way, you may
see an improvement in efficiency which could in fact get a lot more bang
for a lot less time invested.

A non-organized testing effort takes kind of a shotgun approach:  "Here's
some ISO's; go install them and then file bugs."

These bug reports can vary widely in quality.  Since that's the only
output, scaling this style of testing up just means a lot more bug
reports.

Yet, we often can't answer some rather basic questions.  From our test
effort, was Alpha-2 measurably better or worse than Alpha-1?  Were there
any particular anomalies or regressions that affected a lot of people?
How broad of hardware coverage did we achieve?

Organized testing efforts know ahead of time specifically what to test,
how to test it, and how to capture all the data in machine digestible
formats so analysts can look for patterns later.  It uses scripts or
paint-by-number procedures for folks to follow so the same data is
gathered consistently from everyone.  And over time people will script
the more time consuming procedures, which can make everyone more and
more efficient over time.

I've seen lots of examples of this style in Ubuntu over the years.  I
like how the kernel team passes around USB keys with the kernel they
want tested, and whatever tests or scripts they need.  The ISO tracker
folks have a nifty collection of testing procedures and infrastructure.
Checkbox is another example.

In this style of testing the tangible output is sets of pass/fail data
points; bug reports are generated too but those are just derivative
data.  Your goal is to end up with a consistent collection of data you
can plot to show that yes, Alpha-2 gives 20% more passes than Alpha-1
did, that overall performance is 5% faster, and that testing covered 42%
more types of hardware.

Since you know what you want to test, and have measurements of what
you've measured in the past, you will know what you *don't* need to
test.  For example, you may find that most graphics regressions tend to
happen on systems with the newest video cards, so you could scale back
testing on older hardware, and focus more on the newer systems.

Bryce