RFC: startup time - again

Fri Sep 12 10:02:59 BST 2008

Russel Winder wrote:
> On Fri, 2008-09-12 at 11:11 +0300, Alexander Belchenko wrote:
> 
> > You can think what you want but this is the best numbers for 10 runs.
> > Yes, of course numbers is not stable. So what?
> 
> So why not say that you took the average over 10 runs so as to allay
> fears of people who think decision making based on a single data point
> is not a clever thing to do?  I assume this is the average and not just
> the best, since that would be equally fallacious as testing on just one
> data point. 

“equally fallacious” isn't right.  It depends on the question you are asking.

If you assume that the variability of the results between runs is due to the
system doing other work (rather than inherent variation in the process you are
testing), then comparing the best of 10 gives you a better idea of which bzr.exe
is doing less real work when you ignore the noise.  (Because the lowest of each
set is therefore the run with the least noise.)

If you just want to know which bzr.exe is going to be faster on average on a
system subjected to the noise, then averages makes sense.

Python's timeit module intentionally reports a “best of”, because it is a useful
thing to know.

-Andrew.