brainstorming for UDS-N - Performance

Thu Sep 30 23:34:42 BST 2010

On Sep 30, 2010, at 3:10 PM, Kees Cook wrote:

> On Wed, Sep 29, 2010 at 07:43:49PM +0100, Matthew Paul Thomas wrote:
>> Allison Randal wrote on 28/09/10 21:11:
>>> 
>>> The Performance track is about measurable speed improvements and also
>>> about snappy, responsive user experience, across all editions of
>>> Ubuntu, from older hardware to high-efficiency devices, and from boot
>>> experience through common user-facing applications and tools.
>>> 
>>> What's high on your list for this area?
>>> ...
>> 
>> Measurement. Where can I go to see the equivalent of Firefox's
>> <http://arewefastyet.com/> for Ubuntu startup speed? Where's the
>> equivalent graph for Ubiquity? For Unity? For Ubuntu Software Center?
>> How much better or worse is yesterday's Natty nightly compared with
>> Ubuntu 10.10? With Ubuntu 10.04 LTS?
> 
> I'm extremely interested in having the various sub-team come up with
> standard measurements so that when people make changes to performance, we
> can actually see it across all of these well-defined workloads.
> 
> I can think of only one that exists now: the boot-time graphs Scott
> manages.
> 
> Changes to the compiler toolchain, the kernel, etc, all have an impact on
> everyone's workloads, but most teams haven't actually stepped forward and
> said "THIS workload is important to us, here's how to reproduce the
> measurement, and here's where we're tracking the daily changes to that
> measurement."
> 

Our newest member here in the server team, James Page, has been
working with Mathias Gug on doing automated ISO testing using Hudson
(a continuous integration framework). The Drizzle project sets a
great example with it too, every time somebody commits to trunk, a
new build is done and people are flagged if there is a performance
regression.

These tests are not hard to build. Many of the packages we build
every day build a test suite which can be used for baseline numbers.

Some important factors:

* Avoid information overload - We don't want to have 100 gauges all
reading off the charts when there's just one low level thing like
a filesystem or kernel bug causing performance regression. I call
this the 3-mile-island problem (3-mile island had so many indicators
of trouble, they didn't know what was wrong because EVERYTHING
looked wrong). So just pick a few broad spectrum tests for overall
measurement, and then target the key areas.

* Think like a user - Running the mysql or drizzle test suite will
definitely provide "a number" for us. But its not a number that can
be turned into a story. "Maverick runs the mysql test suite 43
percent faster than Lucid". What does that mean, even to somebody
who knows mysql inside and out? But "Drupal page loads have been
measured as 4x faster with Maverick servers" means something to
drupal developers, people who have paid drupal developers to make
websites for them, and the users of said sites.

* Geeks like graphs. So lets make sure to graph these results. :)

Given the exposure the infamous phoronix benchmarks had, it would
be good to get ahead of that story early on in a dev cycle.