Disabling whoopsie by default in the 12.04.1 release

Evan Dandrea ev at ubuntu.com
Mon Aug 6 15:53:13 UTC 2012


On Mon, Aug 6, 2012 at 12:09 PM, Sebastien Bacher <seb128 at ubuntu.com> wrote:
> Hey Evan, thanks for the reply!

Sure thing :)

> That's 10 times a week, that seems very often to me yes. I would be all for
> fixing the issues but in reality the resources allocated to that don't allow
> to make strong enough progress (out of software-center that you pointer
> before which is lucky to have a full team dedicated to it)

Sure, but as Matthew points out, these dialogs serve two purposes:
1. To let the user know what just happened.
2. To let us prioritize work effectively.

They do not make promises about work getting done. If there's software
that is used by a sufficient number of people to move the needle on
the front page of http://errors.ubuntu.com but we do not have the
resources to fix the bugs in it, then we should have conversations
about dropping that application as a means of reducing the number or
frequency of errors.

>> https://wiki.ubuntu.com/ErrorTracker#When_there_are_multiple_simultaneous_errors
>
> Is there any chance that would land into the LTS?

I'll push it to the top of my list and try to get it done and tested
by the 16th.

>> If we disable reporting for these issues, then we'll never know the
>> frequency at which they're occurring or what's causing them. We'll be
>> right back at developers having to chose between failing hard to
>> surface an issue or partially recovering from it and never knowing its
>> true extent or the damage it inflicts.
>
> Well, by reviewing the errors since precise a good part of the issues have
> been judged harmless for users (often untrapped python exceptions, issues
> happening at session logout or random glitches from services like oneconf
> which do actions on regular basis and where missing a run has no real
> effect). Consensus in the desktop team is that while those are bugs they are
> often worth not bothering users about and create a sentiment of instability
> where not needed.

I don't see how uncaught Python exceptions are harmless. Even if they
were, codifying this belief of harmlessness in a try, except, pass
block is a very small patch.

Can you elaborate on what's causing these applications to fail on log
out? I don't see how logging out by itself invalidates the contents of
an error report.

> What I meant there is not that the datas are not useful, is that we are
> collecting datas about versions we will not work on (because of the
> resources allocated to the stable release mostly).
> Also the suggestion is to turn whoopsie off only for one extra release
> (precise) and to keep it on on other series.

Well, we will be working on these releases. There are four point
releases scheduled for Ubuntu 12.04. The engineering staff assigned to
these releases may grow smaller, but that will not prevent developers
from uploading software to 12.04. If they do and that software causes
failures that did not exist before, I want them to know about it
quickly and I want the release team to know about it quickly. Even if
no one is going to do anything about it, any regression should be at
least acknowledged and its impact known.

Turning whoopsie off for precise leaves users without an explanation
when any kind of application failure occurs. It means that we cannot
measure stability of 12.10 against 12.04. It completely leaves us in
the dark on the state of 12.04 after the .1 release. It's really going
at the messenger with a hatchet.

> Again the issue are:
> - we don't have enough people dedicated to work on the precise issues, we
> collect datas but don't make use of them by allocating resources to fix the
> issues

We'll have people uploading to precise-updates long after 12.04.1 is
out the door. We should be monitoring the impact of this. We will have
problems that surface only after this time.

> - it's getting hard to use the current errors.ubuntu.com summary because you
> can't filter out things which got a fixed rolled out from those which don't,
> half of "the main page" are probably issues we tackled but where the fix
> didn't reach users, as somebody planned work I would like those out of the
> overview because there is not a lot my team can do there, but it doesn't
> prevent us to see what are the real remaining issues.

This is a problem of our updates mechanism more than anything else.
The numbers are real. We just do a very poor job of getting fixes in
the hands of users.

As mentioned, as soon as the RT 54702 lands, you will be able to see
the version-by-version breakdown of an individual problem. This will
happen soon. We're just working through the dependency chain for
python-tastypie in lucid with builds ongoing as I type this email.

I also have the aforementioned branch to grey out issues that have not
occurred yet in the most recent published version.

>> If I was still working on the installer, I could go to the website
>> right now, punch ubiquity into the box and instantly have a list of
>> what I need to focus my attention on. I *wish* I had this five years
>> ago when Launchpad already had thousands of open bugs for ubiquity.
>
> Right, that's precisely the issue ... we collect those datas but for who?
> Out of software-center who has dedicated people I would say there has been
> very little progress on the most commonly reported bugs since precise.

For the release team to understand if our policies around post-release
updates are sound. For us to understand if Ubuntu 12.04.1 is actually
more stable than Ubuntu 12.04. For our continued understanding of the
gap between issues that need to be fixed and the resources we're
asking our managers for.

> Do we have numbers of:
> - the number of issues reported by day and by user since precise?

Yes, though not in independent graphs. The number of issues reported
by day is used to calculate the average number of errors per day, as
the formula is currently:

total number of reports received for the day / total number of unique
users who reported issues that day

We can also retrieve the reports for a user and look at the individual
dates of all their reports.

> - the evolution of this number?

I'm not sure what you mean by this. We have a graph on the front page
that shows the average number of errors per day. In the aforementioned
RT it's separated out into lines for each Ubuntu version.

> - the sources which have the most issues and how they progressed?

Yes, in the aforementioned RT, you see a graph on the top of the
problem page with the instance count over time. You also get the
version-by-version breakdown on the problem page that I mentioned
above.

> Did we "fix" apport for reporting issues every time they happen or do we
> still stop after 3 instances to not "spam" the users? If we do stop it
> probably helps to lower the number in an artificial way and makes it hard to
> see what issues are still happening or not

We're currently stopping after three instances, but we'd like to fix
that for future releases:

https://bugs.launchpad.net/ubuntu/+source/apport/+bug/989800

> I don't think there is much "Chinese whispers", people who have issues
> raised them, that's what I'm doing atm on this list, pitti replied to the
> thread email, Didier discussed it with you on IRC as well.

You had referenced the consensus from the developers you talked to. I
am asking that those people speak up here. If it's Martin and Didier
then I have no doubt they'll continue to contribute to this thread if
they see the need :)

> Well, the issue is not to "shy away", but:
> - perception matters (a lot), windows was "famous" for its blue screen for
> years (and some still joke about it) ... do we want our best version so far
> to become famous for being the OS which prompt you about system errors every
> single day?

I suspect they care more about the errors that happen than the dialogs
that they get for them.

It's worth noting since you've raised Windows that other mainstream
operating systems do exactly what we're doing here. When applications
crash on Windows or OSX, you get a similar dialog.

> - our users tell us that they think precise to be a lot less stable than
> previous Ubuntu version, and mostly due to the number of errors dialog they
> get (where in practice we know it's pretty stable compared to previous
> versions) ... that has a cost and is an issue we shouldn't dismiss the
> reputation we are building

We'll never actually know if one release is more stable than any
previous release unless we actually measure it over the lifetime of
each release. We cannot do that if we disable error reporting partway
through the lifetime of a release.

> - users see the prompt dialog as a major annoyance, I'm not sure what we can
> do better there though...

The users who are talking to you about it. I suspect these are fairly
technical people who do not need an explanation about what just
happened when an application disappears - feel free to refute that
though. My understanding is that our target market is general
consumers though.

> I would be happy to reconsider my position and proved wrong if I saw
> significant work happening on the list, including the frequent issues on
> those sources: ubiquity, update-manager, jockey, sessioninstaller,
> aptdaemon, update-notifier, software-properties, ... but so far those are
> "unmaintained", and we have a long list of issues we know about and don't
> tackle, I don't see the value of collecting datas further if we already
> don't make use of the ones we have

That list isn't static and neither is our developer pool. If someone
comes along tomorrow wanting to fix ubiquity issues in 12.04, where
will we point them? How will we know that the issues we saw in 12.04
before we turned off error reporting are still the biggest issues
despite numerous software updates potentially happening since?

If we upload a particularly broken blueman that suddenly affects lots
of users who are not familiar with Launchpad, how do we find out about
it? How do we prevent it from getting buried in the noise of all the
other unresolved issues?

If we completely locked precise down and didn't allow any further
changes after the .1 release, I'd be happy to reconsider my position.
But we're not. And letting people continue to upload new versions of
packages while disabling crash reporting strikes me as reckless.



More information about the Ubuntu-release mailing list