Crash database requirements (was: The need for apport hooks (was: Re: SRUs for typo fixes in descriptions))

Tue Aug 9 05:37:01 UTC 2011

On Mon, 2011-08-08 at 00:34 -0700, Bryce Harrington wrote:
> On Mon, Aug 08, 2011 at 12:18:51AM -0700, Bryce Harrington wrote:
> > > I don't know if we've actually written down what we want out of a crash
> > > database, though.  Do we have a requirements document for one?  If the
> > > Launchpad team wanted to devote some time to adding a crash database do
> > > they know what we want out of such a beast?
> > 
> > I agree, that seems like an important first step.  At this point it's
> > unclear whether Launchpad's needs overlap with ours or if they're highly
> > divergent.
> 
> Here's a start...
> 
>  * A collection of files are gathered client-side and inserted into the
>    crash database record.
> 
>  * Processed versions of files (i.e. retracer output) can be added
>    subsequently.
> 
>  * Some files must be kept private (i.e. core dumps)
> 
>  * Traces from multiple crash reports are algorithmically compared to
>    find exact-dupes and likely-dupes.
> 
>  * Crash reports can be grouped by package, by distro release, or by both.
> 
>  * Statistics are generated to show number of [exact|exact+likely] dupes
>    for each type of crash.  Statistics can be provided by package, by
>    distro release, by date range, or a combination.
> 
>  * Bug report(s) can be associated with a given set of crashes.
> 
>  * The user should have some way to check back on the status of their
>    crash report; e.g. have some report ID they can look at to see
>    statistics and/or any associated bug #.

To this list I'd add: 
 * It should be brainlessly easy for users to submit this data.  Either
a single "Yes, submit this crash" confirmation, or a check box to
automatically submit these crashes.  One of the features that the X team
really desire out of this sort of database is "how frequent is this kind
of problem", which requires the widest possible sample space.

 * For X and kernel crashes (at least), these reports need to be
indexable by hardware.  That is, we want to be able to answer both "how
prevalent are GPU hangs on Intel hardware?" and "on what hardware does
this GPU hang appear?".  Probably either DMI data or PCIIDs or both are
needed for this.

While we're using the terminology "crash report", I want to ensure that
there's a sufficiently general understanding of what this means.  I
think we'd want this to cover at least:
 * Actual C-style crashes, with core.
 * Unhandled exceptions, such as you'd get from Python et al
 * Kernel oops and panics
 * Intel GPU dump output
 * dmesg & Xorg.0.log, triggered by GPU hangs

CHRis.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20110809/4145e949/attachment.pgp>