extending apport/problem_report format?

Thu Sep 16 22:47:36 UTC 2010

Hi,

Launchpad.net is looking into whether to use the problem_report python
module to store website errors or even to use the apport python module
to help collect system data for the problem report. Currently, each
exception is stored in a separate "oops" file with a bunch of extra
data, such as the cgi request variables, and it is formatted like an
rfc822 email message to take advantage of modules for formatting and
parsing. The body of the email message is a custom format, so it is
very inflexible.

The oops-tools project, which analyzes and displays the oops files in
a web page, is planned to be open sourced soon. Therefore, I have two
main questions.
1. Is there interest in having the problem_report format be extended
to handle more complex data structures that will be parsed and
analyzed by a tool such as oops-tools?
2. Would apport be interested in receiving other features of
oops-tools, such as the django based web interface for viewing oopses?
Of course, there might already be projects that provide apport any
functionality that oops-tools has to offer.

The second question is probably hard to answer right now, so I'll
focus on the limitations of the problem_report format that we would
either extend in a wrapper class or in problem_report itself.

* problem_report doesn't provide a standard format for complex data.
Even adding another level of name/value pairs inside a field is not
well supported, since you have to use a StringIO object to get the
data from ProblemReport object to put it in a field of another
ProblemReport. Lists of dictionaries would also require their own
format. Here is an example of recursive ProblemReports.

>>> import sys
>>> from problem_report import ProblemReport
>>> from StringIO import StringIO
>>> pr = ProblemReport()
>>> sub = ProblemReport()
>>> sub['one'] = '1'
>>> sub['two'] = '2'
>>> sio = StringIO()
>>> sub.write(sio)
>>> pr['numbers'] = sio.getvalue()
>>> pr['foo'] = 'bar'
>>> pr['z'] = 'zed'
>>> pr.write(sys.stdout)
ProblemType: Crash
Date: Thu Sep 16 16:45:52 2010
foo: bar
numbers:
 ProblemType: Crash
 Date: Thu Sep 16 16:46:00 2010
 one: 1
 two: 2

z: zed
>>>

* problem_report only allows field names to contain letters, numbers,
".", "_", and "-". That could cause problems when dumping a bunch of
name/value pairs from an application in order to analyze it later.
* problem_report really supports text or compressed text files. There
is no ability to specify a content-type even when using
problem_report's write_mime() method. While I can't think of a really
good reason to attach a binary file that isn't just compressed text,
being able to specify the content-type would open up the door to using
multiple human readable formats to handle complex data structures.
* The write_mime() method even encodes the single-line name/value
pairs as base64, so it is not at all human readable. I assume this is
to avoid email restrictions on the length of headers and to avoid
confusion when some characters are escaped as quoted-printable.

Both yaml and json have some advantages when storing complex data
structures, but being able to set the content-type on a field in a
ProblemReport object would enable using whatever format works best.
Assuming a field with a certain name is in a certain format makes it
difficult to use a different format later.

-Edwin