[PERFORMANCE] set operations with dicts _slow_

John Arbash Meinel john at arbash-meinel.com
Thu Jul 26 15:34:11 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> So I've got the repository branch I'm working on in the same ballpark as
> regular repositories (with the C extensions disabled ;)).
> 
> Interestingly the primary time cost was from this code:
> 
> def iter_entries(self, keys):
>     """Iterate over keys within the index.
> 
>     :param keys: An iterable providing the keys to be retrieved.
>     :return: An iterable as per iter_all_entries, but restricted to the
>         keys supplied. No additional keys will be returned, and every
>         key supplied that is in the index will be returned.
>     """
>     keys = set(keys)
>     if not keys:
>         return
>     if self._nodes is None:
>         self._buffer_all() 
>     keys = keys.intersection(self._nodes)
>     if self.node_ref_lists:
>         for key in keys:
>             value, node_refs = self._nodes[key]
>             yield key, value, node_refs
>     else:
>         for key in keys:
>             yield key, self._nodes[key]
> 
> Now, self._nodes is a dict. So keys.intersection(self._nodes) should be
> fast right?
> 
> Well, I had assumed that it would be. But it turns out to be
> pathologically slow. I saved 50% over the entire pull by caching the
> keys in a set at the end of _buffer_all, then doing
>     keys = keys.intersection(self._keys)
> instead. 
> 

I wonder if it doesn't just see it as .intersection(iterable) and not pay
attention to whether other is a dict/list/tuple/etc.

So it might special case against another set(), but not against any other type.

We certainly could investigate that for .difference_update(). But then again,
it would be another set() that you have to manage. And I'm guessing it wouldn't
be much faster than what we do now.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGqLDjJdeBCYSNAAMRAu7UAJwJZ8DRLToNdkRI2DT11f3AxXQoHQCeMobB
rjURDexx3LHbdiRAwSvOPDc=
=V21O
-----END PGP SIGNATURE-----



More information about the bazaar mailing list