[rfc] benchmark reliability

John Arbash Meinel john at arbash-meinel.com
Thu Aug 17 19:46:02 BST 2006


I'm having some weird problems trying to benchmark certain operations
with our test suite.

As you have seen, I recently submitted some benchmarks for the XML
writer. It uses some caching to speed up certain repeated operations. So
 I'm trying really hard to properly clear all caches at the right time,
so as to get repeatable results.

However, I'm seeing something really weird. I want to test the speed
with a full cache, so I effectively do this:

xml5._clear_cache()
read_inventory_from_sring(as_str)
self.time(read_inventory_from_string, as_str)

So the second read should be cached. But what is weird is that doing
this changes the time significantly:

xml5._clear_cache()
read_inventory_from_sring(as_str)
read_inventory_from_sring(as_str)
self.time(read_inventory_from_string, as_str)

The first form consistently takes 580ms, while the second form
consistently takes 523ms. These results are very reproducible on my
machine. There are some fluctuations, but the second form is almost
always faster than the first.

And earlier, I was doing some testing and I commented out all of the
caching code. And then I started doing this:

print len(cache)
clear_cache()
print len(cache)

self.time(...)

And then I would run the benchmarks with and without the 'clear_cache()'
line. Always, the cache would be of size 0. But if I called
'clear_cache()' first, the overall time would slow down by about 50ms.

Something really weird seems to be going on in python, to cause the
variation to be so great, and be so deterministic. I would be okay with
it if it was just random variation. But this is quite reliable.

I also tried switching to a larger Inventory, hoping that more
processing would reduce this variation. But no, it is still about 10%.

With 2 warmup calls the benchmarks look like:

read_from_string_cached_kernel_like_inventory   OK  1362ms/ 6964ms
r.test_read_from_string_kernel_like_inventory   OK  1375ms/ 3701ms

read_from_string_cached_kernel_like_inventory   OK  1371ms/ 6983ms
r.test_read_from_string_kernel_like_inventory   OK  1424ms/ 3783ms

read_from_string_cached_kernel_like_inventory   OK  1356ms/ 6934ms
r.test_read_from_string_kernel_like_inventory   OK  1370ms/ 3685ms


With only a single warmup call, the results are:

read_from_string_cached_kernel_like_inventory   OK  1526ms/ 5562ms
r.test_read_from_string_kernel_like_inventory   OK  1476ms/ 3707ms

read_from_string_cached_kernel_like_inventory   OK  1501ms/ 5501ms
r.test_read_from_string_kernel_like_inventory   OK  1473ms/ 3685ms

read_from_string_cached_kernel_like_inventory   OK  1488ms/ 5478ms
r.test_read_from_string_kernel_like_inventory   OK  1459ms/ 3669ms


That shows a 12% swing. *Also* it shows a bleed-over into the non-cached
version. Where the overall time is also getting worse.

Anyway, since supposedly independent code changes can have a 12% effect
on performance, I'm kind of concerned about our ability to tune bzr
using the current benchmark system. I'm trying to tune whether:

try:
  x = _cache[y]
except KeyError:
  x = func(y)
  _cache[y] = x
  return x

Is faster than:
x = _cache.get(y)
if x is None:
  x = func(y)
  _cache[y] = x
return x

And I can see a 50% swing on the bench_cache_utf8 benchmarks, which tell
me that the former is faster when the key is expected to exist. (800ms
verses 1200ms)

However, in other cases it comes down a little bit closer. My biggest
problem is that it is deterministic. So I can tweak things, and it gets
reproducibly better. But then I change something unrelated, and
everything goes haywire.

Any suggestions?

Running and tweaking under --lsprof is an option, but it penalizes
python function calls and object creation too much. I can have a huge
lsprof improvement, and only get a minor real improvement.


John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060817/aed3db2c/attachment.pgp 


More information about the bazaar mailing list