[RFC] Benchmark reproducibility

Wed May 9 16:18:57 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I keep running into this, and I'm hoping someone might have an idea
about what we can do.

I just wrote 2 benchmarks so that I could try to write a helper function
in pyrex to optimize reading knit index files.

However, the order that I run the benchmarks has a very large effect on
performance, specifically, I can do:

./bzr selftest --benchmark bench_knit --randomize 30

test_read_10k_index_c      OK      193ms/    1483ms
test_read_10k_index_py     OK      160ms/    1383ms

But if I do:

./bzr selftest --benchmark bench_knit --randomize 31

test_read_10k_index_py     OK      205ms/    1479ms
test_read_10k_index_c      OK      147ms/    1370ms

So running second gives a 40ms performance advantage. Now, I plan on
having my function perform better than 40ms faster.

But I'm very concerned about our ability to trust benchmarks when we
have things like this.

The tests are set up very independently, where I generate all new
revision ids, etc. In fact, if I just copy and paste the code into a new
function, I can get:

test_read_10k_index_c          OK      184ms/    1454ms
test_read_10k_index_c_again    OK      150ms/    1372ms
test_read_10k_index_py         OK      222ms/    1518ms
test_read_10k_index_py_again   OK      179ms/    1406ms

These are exactly the same test, I'm just asking it to run 2 times
instead of just once.

For now, I'm just going to leave the '_again' benchmarks in, and just
assume that the fastest version of each is the one to base things on.

As far as running on orcadas, it may be reasonable that while we are
running multiple times we supply a new random number each time.
(--randomize now).

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGQeZhJdeBCYSNAAMRAt68AKCJrH8+JhLHCqtK1dK4B5j8nxY7+wCgspS7
8rhvi5CX/NbM6PdsVcfEJGA=
=icV3
-----END PGP SIGNATURE-----