brainstorming for UDS-N - Performance

Sat Oct 2 14:33:55 BST 2010

  On 10/1/10 2:19 PM, Kees Cook wrote:
>> Have you asked them why that is? Maybe they don't know how to automate
>> the measurement, where to host it, or who to tell about it.
> In discussions at the last UDS, it seems that most teams could not agree
> on what would be valuable to measure. For the teams that did have things
> they wanted to measure (e.g. a specific firefox rendering speed test),
> no one stepped up to automate it.
>
> In a test-driven development style, it really seems like these measurements
> must be defined and automated before work on performance can be done. The
> trouble is that the performance work is rarely being done in the same team
> that will feel the impact, so it's non-trivial to understand the effect on
> another team's performance numbers.
>
> Oddly, I want these things not to measure how awesome upcoming performance
> improvements are, but to justify security/performance trade-offs. :P
> "It's only 10% slower, but you'll never have this class of high-risk
> security vulnerability again!" :)
>
> -Kees
>
These are some critical points.  I recently (yesterday) reviewed the 
data that the automation around Phoronix Test Suite (Phoromatic) has 
been gathering for the last 6 or so months.  The system has two types of 
systems that it tests - A kernel tracker which pulls down the daily 
kernel from the repo, and a full Ubuntu distribution tracker which 
updates to the newer daily image each day.

I have thrown together (ie: not heavily reviewed by myself or others) a 
summary of some interesting data at 
http://www.phoromatic.com/resources/long-term-study/ .

The tests used are more or less random tests that exist with Phoronix 
Test Suite.  The results themselves show very interesting regressions 
all over the place.

As engineers and developers, we tend to operate on the assumption that 
we need to have highly tuned tests that target specific subsystems and 
provide a strong correlation.  My view is that we shouldn't obsess over 
the tests themselves, but rather focus on ensuring that we have broad 
and continuous testing in place at any cost.  Once you start getting the 
data and conducting the analysis, it will become clear where tests are 
missing and usually a byproduct of the analysis will be more tests anyway.

The subtlety in Kees' words around is very important.  Ubuntu does have 
the base capability, but not really the built mechanism to allow 
identification of the regressing package, so a test that identifies an 
regression in a potentially unrelated subsystem can still be used to 
identify the point at which things broke.  At least getting it down to 
the packaging changed to cause a regression shaves a considerable amount 
from the analysis effort.  Allowing teams to dive even deeper and 
earlier within their own packages to resolve a regression.

Regards,

Matthew