Automatic crash reports in the final release

Tue Apr 3 19:44:58 BST 2007

On Fri, Mar 30, 2007 at 10:06:08AM +0200, Martin Pitt wrote:
> Unhandled Python Exceptions
> ===========================
> 
> They have no considerable impact on network bandwidth, or CPU/memory
> resources when processing them, always have perfect stack traces, and
> it should not be hard to develop a tool to automatically mark them as
> duplicates.  Personally I find them very helpful, too.
> 
> This is also relevant for crashes in Ubiquity:
> 
> Mär 29 22:08:26 <cjwatson>      I'll certainly get flooded, *but* I'm going to get flooded *anyway* if the installer is crashing
> Mär 29 22:08:39 <cjwatson>      it's either get flooded with decent-quality bugs, or with poor-quality bugs
> 
> If we want, we have the option to disable them by default, and
> re-enable them in casper, so that we get reports from the live system.

I think the issues with dead bugs are the same for Python and for C, so we
shouldn't draw a global distinction here based on language.  The consensus
in the meeting was that we ought to keep it enabled for Ubiquity, because:

- It sees much less exposure (used once during install, never again)
- Its crashes are very important to fix
- It's more difficult to debug
- If the user doesn't submit a complete report with logs, it's often
  impossible to diagnose the bug afterward, because the logs are stored in
  RAM and lost when the user reboots

> Signal crashes (mostly SIGSEGV)
> ===============================
> 
> We got a lot of them during the Feisty cycle, and we only just
> developed some infrastructure to semi-automatically retrace them. This
> is still a bit brittle, we often get poor results, bugs have to be
> manually tagged, and the current implementation of the retracers takes
> a lot of I/O and CPU power in the DC, and thus does not scale well.
> 
> Writing an automatic dup finder is much harder because many/most of
> the initial stack traces are mostly useless (which is another bug we
> need to track down at some point). 
> 
> Submitting those crashes is very expensive in terms of memory/CPU
> usage for post-processing in the GUI, network bandwidth for
> up/download, and Malone storage size. Although we warn about the
> 'private data' in the GUI, this is not really a decision that a
> novice user can do appropriately, so we have the privacy problem as
> well. In summary, we need a proper crash database for this.

I have seen statistics which suggest that even for the development release,
only a small percentage of crash reports become bugs.  The user pays the
penalty for the local analysis and upload, but there is no benefit to
anyone.

I expect this to be even more true for the stable release, where an even
smaller percentage of users are willing to participate in the bug reporting
process.

> There were differing opinions about the usefulness of crash reports
> for stable releases (e. g. Seb feared the flood, Alexander rather
> prefered to get reports). Can we please collect and weigh them
> here?

We would like to get reports, but the system in its current form isn't
really equipped to handle such a large volume, so I think it's best to
disable it.

We can discuss at UDS what would be required in terms of infrastructure, UI
changes, etc. in order for this to become feasible for stable releases.

> (2) Keep apport itself enabled and have it stuff the dumps into
>     /var/crash, but disable the automatic frontend invocation in
>     update-notifier. This means a wasted processing overhead for
>     the vast majority of the crashes that will happen out there, but
>     the crash reports are retained, so that manually calling
>     apport-{gtk,qt,cli} will continue to work as usual. We could even
>     add a gconf key and a UI somewhere to re-enable it.

This is my preference (with an exception for Ubiquity to keep the current
behaviour, per Colin, unless he changes his mind).

-- 
 - mdz