<div class="gmail_quote">This is a periodic update of the status of the Ubuntu Error Tracker project (whoopsie-daisy).<br><br><b><font size="4">Get involved</font></b><br>
<br>We're looking for people to help with the Ubuntu Error Tracker project. There are lots of interesting tasks that need to be done, spanning a wide range of technologies and skills.<br><br>This document will get you started:<br>
<a href="https://wiki.ubuntu.com/ErrorTracker#How_you_can_help" target="_blank">https://wiki.ubuntu.com/ErrorTracker#How_you_can_help</a><div><br></div><div>Equally, do feel free to get in touch if you would like to help out, but are confused or need more information.<br>
<div><br><b><font size="4">What we're working on</font></b><br><br>Matthew is working on formulating a better algorithm for the "if all updates were installed" line in the average errors per day graph:</div>
<div><a href="http://people.canonical.com/~evand/tmp/IMG_5330.JPG" target="_blank">http://people.canonical.com/~evand/tmp/IMG_5330.JPG</a><br><br>Brian discovered the source of a number of corrupt reports that were being sent to <a href="http://daisy.ubuntu.com" target="_blank">daisy.ubuntu.com</a>. If you hit the close button when apport is collecting information, the report is still sent, but without the rest of the data:</div>
<div><a href="https://bugs.launchpad.net/errors/+bug/1020994" target="_blank">https://bugs.launchpad.net/errors/+bug/1020994</a><br><br><b>Handling multiple errors at once</b><div><b><br></b>We need to coalesce multiple application error reports into a single dialog and do the same for system error reports. I am actively working on a branch to implement this functionality in apport.<br>
<br><a href="https://wiki.ubuntu.com/ErrorTracker#When_there_are_multiple_simultaneous_errors" target="_blank">https://wiki.ubuntu.com/ErrorTracker#When_there_are_multiple_simultaneous_errors</a><br><a href="https://code.launchpad.net/~ev/apport/multiple-simultaneous-errors" target="_blank">https://code.launchpad.net/~ev/apport/multiple-simultaneous-errors</a><br>
<br><b>Redesigning the debconf dialogs to optionally send an error report when shown</b><br><br></div><div>The redesign work for the GTK debconf dialogs is complete and well covered with a newly added test suite. Sending an error report when the box is checked still needs to be implemented. I've had to put this down for now to focus on improvements to <a href="http://errors.ubuntu.com" target="_blank">errors.ubuntu.com</a> needed by the release team and the "handling multiple errors" implementation. Feel free to pick up from the linked branch below. Just let me know you're working on it so that I don't step on your toes.</div>
<div><br></div><div><a href="https://wiki.ubuntu.com/ErrorTracker#When_there_is_a_debconf_prompt" target="_blank">https://wiki.ubuntu.com/ErrorTracker#When_there_is_a_debconf_prompt</a><br><a href="https://code.launchpad.net/~ev/debconf/error-reports" target="_blank">https://code.launchpad.net/~ev/debconf/error-reports</a><br>
<br><b>Optionally send an error report when an application hangs</b></div><div><b><br></b></div><div>In the future, compiz will pop up an apport dialog when an application is hanging, instead of only giving you the option to terminate the process. The support for this in apport landed in 2.3 (r2423) via the --hanging option. Sam pointed out that the solution as presented was not going to work well. Matthew then reworked the UI and updated the specification linked to below. These changes still need to be made to both the compiz branch linked below and to apport. I haven't had time to make these changes myself, so do feel free to pick them up. Just let me know if you do.</div>
<div><br><a href="https://wiki.ubuntu.com/ErrorTracker#app-hang" target="_blank">https://wiki.ubuntu.com/ErrorTracker#app-hang</a><br><a href="https://code.launchpad.net/~ev/compiz/call-apport-on-hangs" target="_blank">https://code.launchpad.net/~ev/compiz/call-apport-on-hangs</a></div>
<div><a href="https://code.launchpad.net/~ev/compiz/call-apport-on-hangs/+merge/113436/comments/243748" target="_blank">https://code.launchpad.net/~ev/compiz/call-apport-on-hangs/+merge/113436/comments/243748</a></div><div>
<a href="https://code.launchpad.net/~ev/compiz/call-apport-on-hangs/+merge/113436/comments/246738" target="_blank">https://code.launchpad.net/~ev/compiz/call-apport-on-hangs/+merge/113436/comments/246738</a><br>
<br><b>Laying the groundwork for creating bug reports from <a href="http://errors.ubuntu.com" target="_blank">errors.ubuntu.com</a></b></div><div><b><br></b></div><div>Right now crash-digger, the service that retraces error reports on Launchpad and daisy's own retracers run entirely independent of one another. Daisy then builds a mapping of crash signatures it's seen to bug numbers that crash-digger has found. This means that right now we cannot create bugs from the daisy retracers or <a href="http://errors.ubuntu.com" target="_blank">http://errors.ubuntu.com</a>.</div>
<div><br></div><div>The initial plan was to have crash-digger talk to a new daisy backend, which would use the daisy database as a shared brain between crash-digger and the daisy retracers. This requires some rethinking of how the backend would behave compared to the existing Launchpad one, as daisy keys on crash signature and crash-digger keys on bugs. There's also a lot of logic around using new bug numbers when a problem is reintroduced, rather than reusing the existing one, that dates back to before we had fine-grained notification controls on Launchpad and would not work in the daisy backend, anyway.</div>
<div><br></div><div>Martin and I came up with an ideal workflow for this back in June:</div><div><a href="http://paste.ubuntu.com/1152707/" target="_blank">http://paste.ubuntu.com/1152707/</a></div><div><br></div><div>However, I've thought about this some more and it might be reasonable to leave crash-digger well-alone and just duplicate the crash-digger bug against the daisy-created bug at the point when daisy is importing those links between crash signature and bug number.</div>
<div><br></div><div>Do feel free to investigate this one, but please do so as part of an email discussion with myself and Martin Pitt.<br><br></div><div><a href="https://code.launchpad.net/~ev/apport/daisy-duplicates-db" target="_blank">https://code.launchpad.net/~ev/apport/daisy-duplicates-db</a><br>
<a href="https://code.launchpad.net/~ev/errors/create-bug" target="_blank">https://code.launchpad.net/~ev/errors/create-bug</a><br><br></div><div><b>Charm the Error Tracker infrastructure</b></div><div><b><br></b></div><div>
There are a set of scripts in <a href="http://bazaar.launchpad.net/~ev/daisy/trunk/files/head:/setup/" target="_blank">lp:daisy/setup</a> which will let you set up some of the Error Tracker infrastructure in an OpenStack cloud. However, this is hackish at best and has been abandoned for charming the components instead. I've made some progress on charming daisy, which should get you enough infrastructure to start reporting local crashes into your Error Tracker instance. We still need improvements to this charm and a charm for the <a href="http://errors.ubuntu.com" target="_blank">errors.ubuntu.com</a> Django site (<a href="https://code.launchpad.net/~ev/errors/trunk" target="_blank">lp:errors</a>).</div>
<div><br></div><div>This is a fairly easy one to tackle if you are comfortable with shell programming.</div><div><br></div><div><a href="https://code.launchpad.net/~ev/charms/precise/daisy/trunk" target="_blank">https://code.launchpad.net/~ev/charms/precise/daisy/trunk</a><br>
<a href="http://paste.ubuntu.com/1152635/" target="_blank">http://paste.ubuntu.com/1152635/</a></div><div><br></div><div><b>Recoverable errors</b></div><div><br></div><div>As of Ubuntu 12.10, you can programmatically generate an error report in your application. Just feed nul-separated key-value pairs to /usr/share/apport/recoverable_problem. If you supply a DialogBody key, it's value will be used as the short description in the apport dialog that appears. Do make sure you provide a DuplicateSignature key - a value that uniquely groups a set of instances into a problem.</div>
<div><br><b>Continued development of the <a href="http://errors.ubuntu.com" target="_blank">errors.ubuntu.com</a> website</b></div><div><b><br></b></div><div>We've landed a number of changes to <a href="http://errors.ubuntu.com" target="_blank">http://errors.ubuntu.com</a> lately:</div>
<div><ul><li>The graph now shows both the average errors per day for both Ubuntu 12.04 and Ubuntu 12.10. However, the calculation we're using for this is incorrect. It divides the number of errors seen in the day by the number of unique systems seen in that same day. A more accurate measure will be to divide the number of errors seen in the day by the number of unique users seen in the past 90 days: <a href="https://bugs.launchpad.net/daisy/+bug/1033913" target="_blank">https://bugs.launchpad.net/daisy/+bug/1033913</a>. I've fixed this, but given a datacenter move it will have to wait until next week to be deployed.</li>
<li>In the most common problems table, if a linked bug is marked as completed, the entire line will be greyed out. If the "Last seen" version is not the latest version, then the "Last seen" version will be greyed out. This implies that the issue is no longer present in the more recent version. If the "Last seen" version is the latest version, but the linked bug is marked as completed, then the "Last seen" version will be marked red to indicate a possible regression. The code for this talks to Launchpad, slowly. We think this may be what's causing some timeouts to appear when loading the table. I have a deployment in process to allow us to turn this functionality on and off as part of the URL, which should help us get to the bottom of the problem.</li>
<li>You can now select a date range for the most common problems table.</li><li>The individual problem pages now show a graph of the number of instances over time. They also show a breakdown of the number of instances by version of that application.</li>
<li>The individual instance pages have been redesigned. They now look more like apport reports, with expanders hiding the fields you're unlikely to care about in the majority of cases.</li><li>There is an outstanding deployment request to move us to a better system for managing authentication. Soon you will no longer have to log in every time you view a problem or instance page. Nor should you get the errors that some people were seeing when attempting to authenticate.</li>
</ul><a href="https://code.launchpad.net/~ev/errors/trunk" target="_blank">https://code.launchpad.net/~ev/errors/trunk</a><br><a href="https://code.launchpad.net/~ev/daisy/trunk" target="_blank">https://code.launchpad.net/~ev/daisy/trunk</a><br>
<a href="https://code.launchpad.net/~ev/oops-repository/whoopsie-daisy" target="_blank">https://code.launchpad.net/~ev/oops-repository/whoopsie-daisy</a></div>
<div><br></div><div><b>In the future</b></div><div><br></div><div>Of course there is still plenty of work to be done. Feel free to grab something and help out. Just let us know if you do.</div><div><br></div><div><a href="https://bugs.launchpad.net/errors" target="_blank">https://bugs.launchpad.net/errors</a></div>
<div><a href="https://bugs.launchpad.net/daisy" target="_blank">https://bugs.launchpad.net/daisy</a></div><div><br></div><div><a href="https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-crash-database" target="_blank">https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-crash-database</a></div>
<div><a href="https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-metrics" target="_blank">https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-metrics</a></div><div><a href="https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-updates-from-crash-reports" target="_blank">https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-updates-from-crash-reports</a></div>
<div><a href="https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-fix-ddebs" target="_blank">https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-fix-ddebs</a></div><div><a href="https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-bucketing-improvements" target="_blank">https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-bucketing-improvements</a></div>
<div><a href="https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-phased-updates" target="_blank">https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-phased-updates</a><br><br><b><font size="4">Interesting numbers</font></b><br>
<br>There is not much variance in the number of unique systems that reported errors in a day period. On the first day we recorded it, there were 67,565 unique systems. That dipped into the upper 40s and has since fluctuated between the upper 40s and upper 50s. This does not at all imply that the same systems are reporting every day, and indeed the data does not bear that out.<div>
<br></div><div>Given a formula of the total number of errors reported in a day divided by the number of unique systems reporting errors over a 90-day period, the average system is experiencing 0.05 errors per day.<br><br>
<b><font size="4">Open tickets</font></b><br><br><a href="https://rt.admin.canonical.com/Ticket/Display.html?id=55342" target="_blank">55342</a> - <b>Please deploy lp:errors r146</b><br><br>This is being working on this now, but likely wont land until after the weekend. It provides <a href="http://errors.ubuntu.com/?launchpad=false" target="_blank">http://errors.ubuntu.com/?launchpad=false</a>, which lets us turn off the launchpad integration and see if that has much of an effect on the recent timeouts in the most common problems table. It also fixes the "my error reports" URL (<a href="https://errors.ubuntu.com/user/sha512-of-system-uuid" target="_blank">https://errors.ubuntu.com/user/sha512-of-system-uuid</a>).<br>
<br><a href="https://rt.admin.canonical.com/Ticket/Display.html?id=55322" target="_blank">55322</a> - <b>Setup django-openid-auth backed by a database for <a href="http://errors.ubuntu.com" target="_blank">errors.ubuntu.com</a></b><br>
<br>This fixes the "OpenID from two Apache frontends" problems as well as caching logins.<br>
<br>It also opens the door to having the default view of <a href="http://errors.ubuntu.com" target="_blank">http://errors.ubuntu.com</a> be "errors that I am responsible for" as we can match the group data from OpenID against the spreadsheet of package to team mappings that Kate and the QA engineers created. Finally, it means that we can further restrict and provide an audit trail when the server-side package hooks get implemented.</div>
<div>
<br><a href="https://rt.admin.canonical.com/Ticket/Display.html?id=53325" target="_blank">53325</a> - <b>Need an instance of jmxtrans talking to the crash database cassandra ring and outputting to a (new?) graphite server</b><br>
<br>This finally, finally gives us something for monitoring the health of the Cassandra cluster by the moment, and lets us get the big picture view that nodetool (Cassandra's console based stats program) does not provide. It feeds the JMX data from Cassandra into Graphite. Included will be data like the current and average read/write speed, the state of compaction, etc.<br>
<br>This also covers setting up statsd in order to get graphs of API calls and failures from <a href="http://errors.ubuntu.com" target="_blank">errors.ubuntu.com</a>, as well as graphs of other non-persistent data from <a href="http://daisy.ubuntu.com" target="_blank">daisy.ubuntu.com</a>.<br>
<br><a href="https://rt.admin.canonical.com/Ticket/Display.html?id=52506" target="_blank">52506</a> - <b>Staging setup for crash database</b><br><br>This is complicated by the fact that we do not yet have a strategy for feeding data from the production ring into the staging ring when the latter is smaller, which is delaying the ticket.<br>
<br><a href="https://rt.admin.canonical.com/Ticket/Display.html?id=55339" target="_blank">55339</a> - <b>Investigate crashdb OOPS column diskspace tuning</b><br><br>We're hitting some growing pains around the column family that contains the actual error reports. We can ease compaction (housekeeping for performance) by doubling the amount of I/O we do. Given that the OOPS column family is heavily weighted towards writes (I imagine people mostly get what they need from the problem pages; I could be wrong), I suspect we'll have to look a bit further for alternatives.</div>
</div></div></div>
</div><br>