extending apport/problem_report format?

Wed Sep 22 11:24:05 UTC 2010

Hello Edwin,

Edwin Grubbs [2010-09-16 17:47 -0500]:
> Launchpad.net is looking into whether to use the problem_report python
> module to store website errors or even to use the apport python module
> to help collect system data for the problem report. Currently, each
> exception is stored in a separate "oops" file with a bunch of extra
> data, such as the cgi request variables, and it is formatted like an
> rfc822 email message to take advantage of modules for formatting and
> parsing.

That indeed is what Apport .crash reports have as well.

> The oops-tools project, which analyzes and displays the oops files in
> a web page, is planned to be open sourced soon. Therefore, I have two
> main questions.
> 1. Is there interest in having the problem_report format be extended
> to handle more complex data structures that will be parsed and
> analyzed by a tool such as oops-tools?

Not from my side. So far we got along well with just having a
single-layer dictionary. The convention for lists as values is to have
one element per line, e. g.:

Dependencies:
 libfoo1
 libbar2

Can you point out an example what else you need?

> 2. Would apport be interested in receiving other features of
> oops-tools, such as the django based web interface for viewing oopses?

Is this read-only, or can you also update the data there? We have used
Launchpad Bugs as a "crash database" backend so far, because a bug
tracker provides us all the functionaly that we need, except that it's
sometimes hard to tell apart crashes and regular bugs, for getting a
clean view for triagers.

It sounds like an interesting option, though, if it can represent the
structure of Ubuntu, like distros/packages/package versions, etc.

> The second question is probably hard to answer right now, so I'll
> focus on the limitations of the problem_report format that we would
> either extend in a wrapper class or in problem_report itself.
> 
> * problem_report doesn't provide a standard format for complex data.

Right, it currently uses standard RFC822, which doesn't define any
more complex data types.

> Even adding another level of name/value pairs inside a field is not
> well supported, since you have to use a StringIO object to get the
> data from ProblemReport object to put it in a field of another
> ProblemReport. Lists of dictionaries would also require their own
> format. Here is an example of recursive ProblemReports.

This works fine if you hardcode assumptions about the syntax of
particular field names, which we generally have to for such
post-processing scripts anyway. 

But if we need complex data structures, then I'd rather use a standard
format like JSON for this, as you suggested.

The problem_report module is not conceptually limited to RFC822 only.
For example, it also has the ability to output its data Multipart/MIME
format (for uploading data to Launchpad). So it wouldn't be a problem
at all to add reading/writing JSON.

However, the module currently _is_ conceptually limited to a single
level dictionary structure, since API users can (and do) pretty much
treat it as a dictionary with extra features, and can currently rely
on the data types of the values (strings). We could allow more, and
then just fix the existing write() and write_mime() to throw an
exception if they encounter an unrepresentable data type; this would
mean you could never upload such a report to Launchpad bugs.

> * problem_report only allows field names to contain letters, numbers,
> ".", "_", and "-". That could cause problems when dumping a bunch of
> name/value pairs from an application in order to analyze it later.

That's not a problem in Apport and package hooks, since (as pointed
out before) the set of key names is pretty much static. In the cases
where it isn't, hookutils provides a helper for cleaning up key names.
I'd like to avoid arbitrary strings here, since it can easily lead to
problems, break the RFC822 format, or cause unexpected errors in
scripts which process those reports.

> * problem_report really supports text or compressed text files. There
> is no ability to specify a content-type even when using
> problem_report's write_mime() method.

In general we know what content type a field has. If not, then you
could always specify it in another field, like:

Data: blob0xDEADBEEF
DataType: image/jpeg

?

> * The write_mime() method even encodes the single-line name/value
> pairs as base64, so it is not at all human readable. 

Only if it's longer than 5 lines or has non-ASCII characters,
otherwise it lands in the "short values" text section (where it is
readable).

But why do you care? This format is supposed to be nothing more than a
transport vehicle from client computers to Launchpad. It's not really
supposed to be looked at by humans.

Thanks,

Martin

-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/ubuntu-devel-discuss/attachments/20100922/997583ba/attachment.sig>