[RFC] Benchmark reproducibility
John Arbash Meinel
john at arbash-meinel.com
Wed May 9 16:18:57 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I keep running into this, and I'm hoping someone might have an idea
about what we can do.
I just wrote 2 benchmarks so that I could try to write a helper function
in pyrex to optimize reading knit index files.
However, the order that I run the benchmarks has a very large effect on
performance, specifically, I can do:
./bzr selftest --benchmark bench_knit --randomize 30
test_read_10k_index_c OK 193ms/ 1483ms
test_read_10k_index_py OK 160ms/ 1383ms
But if I do:
./bzr selftest --benchmark bench_knit --randomize 31
test_read_10k_index_py OK 205ms/ 1479ms
test_read_10k_index_c OK 147ms/ 1370ms
So running second gives a 40ms performance advantage. Now, I plan on
having my function perform better than 40ms faster.
But I'm very concerned about our ability to trust benchmarks when we
have things like this.
The tests are set up very independently, where I generate all new
revision ids, etc. In fact, if I just copy and paste the code into a new
function, I can get:
test_read_10k_index_c OK 184ms/ 1454ms
test_read_10k_index_c_again OK 150ms/ 1372ms
test_read_10k_index_py OK 222ms/ 1518ms
test_read_10k_index_py_again OK 179ms/ 1406ms
These are exactly the same test, I'm just asking it to run 2 times
instead of just once.
For now, I'm just going to leave the '_again' benchmarks in, and just
assume that the fastest version of each is the one to base things on.
As far as running on orcadas, it may be reasonable that while we are
running multiple times we supply a new random number each time.
(--randomize now).
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGQeZhJdeBCYSNAAMRAt68AKCJrH8+JhLHCqtK1dK4B5j8nxY7+wCgspS7
8rhvi5CX/NbM6PdsVcfEJGA=
=icV3
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list