[MERGE] Packs. Kthxbye.
Andrew Bennetts
andrew at canonical.com
Fri Oct 19 02:46:18 BST 2007
John Arbash Meinel wrote:
[...]
>
> ^- I used to think the idiom
> request_groups.setdefault(index, []).append((key, value, references))
> would be faster.
>
> I did some timing, though, and at least for integer keys, doing
>
> import random
>
> unique_vals = 100
> total_count = 100 * 1000
> data = [random.randint(0, unique_vals) for x in xrange(total_count)]
>
> # Method 1
> d = {}
> for x in data:
> if x not in d:
> d[x] = []
> d[x].append(x)
>
> # Method 2
> d = {}
> for x in data:
> d.setdefault(x, []).append(x)
If you change this to:
# Method 2 tweaked
d = {}
ds = d.setdefault
for x in data:
ds(x, []).append(x)
I find that the Method 2 does better than Method 1 in this test. (By about the
same margin that 1 was beating 2 before the tweak.)
[
> So it seems a hash lookup for integers is faster than allocating a temporary list.
Well, allocating a list *and* looking up an attribute.
Of course, you can apply the same trick to Method 1...
# Method 1 tweaked
# d holds 'append' bound methods. To access the original list, do d[x].__self__
# rather than just d[x].
d = {}
for x in data:
if x not in d:
d[x] = [].append
d[x](x)
Which is faster than my tweaked Method 2.
> I wonder if that holds true for strings, though.
It's easy to modify your script to test: just add a “data = map(str, data)”.
You may be interested to know that string objects cache their hash value. So
repeatedly hashing a string is basically as cheap as hashing it once.
-Andrew.
More information about the bazaar
mailing list