looking before leaping - Python Dict: Unscientific benchmarks of lookup methods

Tue Jun 28 13:26:26 UTC 2011

I've done the measuring in the past more comments:
1) D.get() involves 2 dict lookups. The first one is getting the "get"
attribute. And then it has function call overhead
2) If x in d: return d[x] ... Is 2 lookups on success and 1 on failure, it
is also a branch prediction
3) try/except is 1 lookup on success and an except overhead on miss
4) don't forget d.setdefault(x, N)
Exceptions are much more expensive than a dict lookup. The hit rate matters
a lot. Often you'll have a 99+% hit rate, so saving the extra lookups is
worthwhile.
If you have modest hit rate in a loop, you can cache the attribute lookup.
D_get = D.get
for x in ...:
  Y = D_get(x, ...)

I've done it all. But honestly, if a bit of code is that performance
sensitive, you should write it in pyrex before you make the python code
ugly.

Absolutely measure. In fact, measure with real data. Timeit is nice, lsprof
is nice, but all just hints. (d.get triggers an lsprof trace, but x in d and
d[x] is evaluated in the interpreter, w/o triggering trace, etc)

The rough guidelines I've found:
1) use pyrex if it really matters
2) use try/d[x]/except if you know that the keys are almost always there (eg
counting types)
3) use if x in d: d[x] if hit rate is low
4) avoid extra attribute lookups, especially in list comprehensions
John
=:->
On Jun 28, 2011 12:25 PM, "Marius Kruger" <amanic at gmail.com> wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20110628/f113a8bd/attachment.html>