[rfc] benchmark reliability

Fri Aug 18 06:56:55 BST 2006

On Fri, Aug 18, 2006 at 01:39:30PM +1000, Martin Pool wrote:
> On 17 Aug 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> 
> > Any suggestions?
> > 
> > Running and tweaking under --lsprof is an option, but it penalizes
> > python function calls and object creation too much. I can have a huge
> > lsprof improvement, and only get a minor real improvement.
> 
> Like Robert, I would guess this is caused by Python keeping some memory
> allocated in a pool, so that the second time you use it it's faster.  I
> am surprised it's so substantial though.  You could ask Andrew, or
> Holger for an opinion as they're familiar with Python's innards.

This sounds plausible to me, but I'm not sure of any good way to verify it -- I
don't think there's any way to introspect the obmalloc pools from python code.
Perhaps looking at raw memory usage for the process (or even /proc/self/smaps?)
at various points in time would give an indication.

Another thing that can deterministically add noise to benchmarks is Python's
garbage collector -- it periodically looks for and collects cycles depending on
how many object allocations & deallocations have happened.  The knobs to twiddle
are in the gc module.  It may be interesting to run benchmarks with garbage
collection disabled, but that will tend to give slightly better than real
results[1] (and it's what the timeit.py module does).  Perhaps calling
"gc.collect()" between benchmarks will help prevent one benchmark's garbage from
impacting the next's timing.

You could try gc.set_debug(gc.DEBUG_STATS) to get an idea how often the
collector is running and if it's likely to be a problem.

-Andrew.

[1] Although *not* reclaiming memory held by cycles could in theory have a
    negative impact, so who knows?