Measuring success/failure in the installation

Martin Pitt martin.pitt at ubuntu.com
Thu Jun 16 18:26:32 UTC 2011


Hello all,

I was hoping we could discuss this in today's meeting, but as only
Scott and I turned up, let's continue this by email.

Evan Dandrea [2011-06-15 12:17 +0100]:
> Instead of operating entirely in the background without user
> intervention, I am proposing that the installer include a checkbox
> with the label, "Send information about setup to help improve the
> experience."  It will be present on the last page of the installer as
> well as the quit and crash dialogs.

I feel a lot better about this than the previous proposal, for two
reasons:

 * We allow the paranoid folks to opt out, and have something to point
   to if someone complains.

 * It seems to be much better compatible with current legal matters.
   (But deferring to legal team for the definitive decision here)

 * It collects useful data, so instead of just knowing "we got worse",
   we can actually tell why (traces/logs) and where (it might affect
   particular platforms/graphics cards/etc.)

> In order to get accurate statistics, we need data from the type of
> individual who will not mind if this is enabled, but will equally not
> read and understand the option to manually enable it.  Therefore, this
> option will be checked by default.

I agree that this makes sense; However, for this it should actually
tell you the particular things what it will send, so that users can do
a qualified decision what kind of data they want to protect. Your
description might still need some verbiage fine-tuning, but it already
sounds quite good:

> By leaving the box checked, the following information will be sent to
> a database on our server:
> 
> - Installer version, live CD label, and build date.
> - Is the user in the desktop session or the installer only session?
> - System hardware profile (some pared down combination of xvinfo,
> dmidecode, lshw, udisks).
> - Installation time and individual step length.
> - The options selected in the install, excluding the user page entirely.
> 
> If the installation crashed, the following extra information will be collected:
> 
> - Whether it was an unrecoverable failure.
> - The stack trace.
> - The debug logs.
> - A process listing.

Of all these, the pieces that are most sensitive are the stack trace
and the debug logs, as they contain personal information:

 * The chosen computer and user names
 * Mount point names
 * Potentially, passwords in the stack trace (I think we don't put
   them into the log by default). This could be avoided by ensuring
   that the password is never passed in cleartext to any
   function/method arguments, but as it goes through the GTK and other
   library stacks this might be hard to do?

I actually didn't make up the first two -- I did get angry bug reports
about these even for apport, which is a lot more "opt in". Also, the
chosen computer/user/mount point names don't give us valuable
information for debugging, so could we filter them out for the report
sent to us?

> I leave it to the Technical Board to decide whether this information
> should be public or not.

My preference is to ensure that it doesn't have personal data, just
data about the installer environment and the hardware (we need to
ensure that it doesn't have serial numbers, etc.).

Wrt. to that, do you plan to re-use python-apport for it? It provides
an API, report format, and plenty of functions to collect hardware
information etc., and it already takes care of anonymization.

> As mentioned previously, my only concern with it being public is the
> data being misrepresented as a count of users, when it will simply
> not be used by a sizeable enough proportion to accurately measure
> that.

Right, and it won't cover OEM or other preinstalls at all, nor offline
installs, nor upgrades, nor people who don't upgrade for a while. I
can't see how anyone could make a claim that these numbers get even
remotely close to the user base. The relative growth is interesting of
course, but we already have that on security.u.c. So I think with the
assumption that the goal is to not count the absolute users this
proposal is quite "safe".

So in summary, a +1 from me with above conditions (explain what data
is sent and anonymization).

Thanks!

Martin

-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/technical-board/attachments/20110616/5b6d9c02/attachment.pgp>


More information about the technical-board mailing list