More diagnostics data from desktop

Matthew Paul Thomas mpt at canonical.com
Wed Feb 21 12:18:35 UTC 2018


Will Cooke wrote on 20/02/18 11:10:
>> We were thinking along the lines of something which would try to send
> the data at login a number of times, let's say.... 10, and then give
> up.  So if the machine never comes on line, then the data never gets
> sent.  If the machine travels between various locations before
> arriving at a working internet connection, then it should eventually
> be able to send it.  I think that would cover the vast majority of
> cases.

That seems reasonable.

>>> For example, if we could see how often people change their reported
>> location, we’d have info on how accessible the time zone UI should
>> be. And if it turns out that only a tiny fraction of Livepatch users
>> turn it on during install, vs. afterwards, that would influence
>> future installer design.
> 
> Wouldn't that involve us being able to track a person/machine to know
> when it had been changed?  Or would something in the location picker
> send a signal?  I don't like either of these options, I've probably
> misunderstood the idea.

Neither, I think. The location example would require Ubuntu to count how
often the setting had changed (for example, “2 changes in the past
month”). And the Livepatch example would require Livepatch to record
whether it had been configured in the installer or System Settings.

>>> Any user can simply opt out by unchecking the box, which triggers
>>> one simple POST stating, “diagnostics=false”.
>> 
>> What is the purpose of this?
> 
> To try and measure engagement rates.  This would be important in the
> "opt-in" case I think, how representative of users is the data?  < 10%
> of people are submitting data, then probably not very useful.
>
A 10% response rate would be necessary, as far as I can tell, only if
there were fewer than 3460 Ubuntu desktop users in the world. Assuming
that you’re going for a 5% margin of error with 95% confidence level.

If there were, say, 4 million Ubuntu desktop users in the world, you’d
need only 385 submissions to reach that level of confidence. Even for a
more precise 2% margin of error, you’d need only 2400 submissions — that
is, you’d need only a 0.06% response rate. (This is why someone polling
a state/country, which has a million voters, doesn’t need 100 000
responses. Often they collect just 1000.)

As long as the number of Ubuntu desktop users is anything more than half
a million, the response rate is basically irrelevant: the required
sample size stays almost constant. For example, to get that 2% margin of
error from 500 000 Ubuntu users would require 2390 submissions, while
from 100 million Ubuntu users it would require 2401 submissions.

Now, I’m not a statistician, so maybe I’ve made a silly miscalculation
or misunderstanding. If you were planning to do any sub-sample analysis,
or reweighting for known biases, then the original sample would need to
be bigger. But if you were proposing that this be opt-out merely because
you thought we’d need a ≥10% response rate — or if you were proposing
“diagnostics=false” because you thought we’d need to measure the
response rate at all — then I’d strongly encourage consulting a
statistician first.

Cheers
-- 
mpt



More information about the ubuntu-devel mailing list