Measuring success/failure in the installation
Matt Zimmerman
mdz at ubuntu.com
Thu May 26 14:52:45 UTC 2011
On Wed, May 25, 2011 at 08:20:30AM -0700, Scott James Remnant wrote:
> On Wed, May 18, 2011 at 2:33 AM, Evan Dandrea <ev at ubuntu.com> wrote:
>
> > To be clear, since it wasn't addressed in my original email, I intend
> > to only present percentages of successful and unsuccessful installs.
> >
>
> As discussed externally, and now repeated here for the Technical
> Board, this is the part of the proposal I have a problem with.
>
> All raw data collected by this feature should be public.
>
> There are three good reasons:
>
> 1) Transparency.
>
> Making the raw data available makes it clear that the data
> collected is anonymized and non-identifying. Users with concerns can
> be shown the raw data URL and can verify for themselves that the data
> does not identify them.
I agree that transparency is valuable, but making the raw data available
doesn't allow the user to verify that no identifying information was shared
or stored. That's pretty hard to do.
> 2) Verification.
>
> With the raw data hidden, and only stats published by yourself,
> there is no guarantee of honesty. If you claim that Ubuntu has 97%
> successful installs, and somebody doubts that, they cannot go back to
> the raw data and verify your results.
It doesn't prove that the data is accurate or representative.
> 3) Collaboration.
>
> This, to my mind, is the most important.
>
> Hiding the raw data, making it available only to yourself, makes
> it harder for other developers to collaborate with you. Ubuntu is
> still a community project.
>
> Say a developer wanted to not just look at installer success, but
> on the average length of time in the installer, and undertake a
> project to reduce it. With the raw data from this available, the
> developer can trivially patch to add timestamps to the data set, and
> analyse themselves for their project.
>
> With the data not available, that developer has to start from
> scratch, including going though this procedure again.
This is a good reason to share raw data. It's useful for people to be able
to run their own analyses.
That said, the question of whether we share the raw data isn't a
deciding factor for me. I think we should do this measurement because it's
useful in itself.
--
- mdz
More information about the technical-board
mailing list