Measuring success/failure in the installation

Thu May 26 14:52:45 UTC 2011

On Wed, May 25, 2011 at 08:20:30AM -0700, Scott James Remnant wrote:
> On Wed, May 18, 2011 at 2:33 AM, Evan Dandrea <ev at ubuntu.com> wrote:
> 
> > To be clear, since it wasn't addressed in my original email, I intend
> > to only present percentages of successful and unsuccessful installs.
> >
> 
> As discussed externally, and now repeated here for the Technical
> Board, this is the part of the proposal I have a problem with.
> 
> All raw data collected by this feature should be public.
> 
> There are three good reasons:
> 
>  1) Transparency.
> 
>     Making the raw data available makes it clear that the data
> collected is anonymized and non-identifying. Users with concerns can
> be shown the raw data URL and can verify for themselves that the data
> does not identify them.

I agree that transparency is valuable, but making the raw data available
doesn't allow the user to verify that no identifying information was shared
or stored.  That's pretty hard to do.

>  2) Verification.
> 
>     With the raw data hidden, and only stats published by yourself,
> there is no guarantee of honesty. If you claim that Ubuntu has 97%
> successful installs, and somebody doubts that, they cannot go back to
> the raw data and verify your results.

It doesn't prove that the data is accurate or representative.

>  3) Collaboration.
> 
>     This, to my mind, is the most important.
>
>     Hiding the raw data, making it available only to yourself, makes
> it harder for other developers to collaborate with you. Ubuntu is
> still a community project.
>
>     Say a developer wanted to not just look at installer success, but
> on the average length of time in the installer, and undertake a
> project to reduce it. With the raw data from this available, the
> developer can trivially patch to add timestamps to the data set, and
> analyse themselves for their project.
> 
>     With the data not available, that developer has to start from
> scratch, including going though this procedure again.

This is a good reason to share raw data.  It's useful for people to be able
to run their own analyses.

That said, the question of whether we share the raw data isn't a
deciding factor for me.  I think we should do this measurement because it's
useful in itself.

-- 
 - mdz