Performance comment (oddity with set.update())

James Henstridge james at jamesh.id.au
Wed Apr 23 06:20:19 BST 2008


On 23/04/2008, John Arbash Meinel <john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
>  Hash: SHA1
>
>  I was just debugging some performance issues, and I came across something
> funny.
>
>  Specifically, calling 'set.update()' with a generator that returns nothing
> is
>  slower than building it into a list and then updating.
>
>  With python 2.4.3:
>
>  $ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
>  1000000 loops, best of 3: 0.296 usec per loop
>  $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
>  1000000 loops, best of 3: 0.837 usec per loop
>  $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
>  1000000 loops, best of 3: 0.462 usec per loop
>
>
>
>  With 2.5.1 (on a different machine)
>  $ python -m timeit -s 'x = set(range(10000))' 'x.update([])'
>  1000000 loops, best of 3: 0.265 usec per loop
>  $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])'
>  1000000 loops, best of 3: 0.717 usec per loop
>  $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])'
>  1000000 loops, best of 3: 0.39 usec per loop
>
>
>  Anyway, might be something to keep in mind for some of our inner functions.

This is probably function call overhead.  The generator expression
requires setting up a stack frame, and calling in to the generator to
find that it contains no items.

In comparison, the list comprehension is evaluated in the existing
stack frame, and allocating zero-length lists isn't that expensive.

That said, a generator only needs to allocate a single stack frame
which is reused each time you call into it.  Doing a few timeit runs
with list lengths ranging from 0 to 5 seems to indicate that the
generator expression incurs a constant penalty compared to the list
comprehension, rather than it being related to the length of the list.

Of course, once the list gets long enough for memory allocation to be
an issue the generator should be a win.

James.



More information about the bazaar mailing list