[PERFORMANCE] set operations with dicts _slow_
John Arbash Meinel
john at arbash-meinel.com
Thu Jul 26 15:34:11 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> So I've got the repository branch I'm working on in the same ballpark as
> regular repositories (with the C extensions disabled ;)).
>
> Interestingly the primary time cost was from this code:
>
> def iter_entries(self, keys):
> """Iterate over keys within the index.
>
> :param keys: An iterable providing the keys to be retrieved.
> :return: An iterable as per iter_all_entries, but restricted to the
> keys supplied. No additional keys will be returned, and every
> key supplied that is in the index will be returned.
> """
> keys = set(keys)
> if not keys:
> return
> if self._nodes is None:
> self._buffer_all()
> keys = keys.intersection(self._nodes)
> if self.node_ref_lists:
> for key in keys:
> value, node_refs = self._nodes[key]
> yield key, value, node_refs
> else:
> for key in keys:
> yield key, self._nodes[key]
>
> Now, self._nodes is a dict. So keys.intersection(self._nodes) should be
> fast right?
>
> Well, I had assumed that it would be. But it turns out to be
> pathologically slow. I saved 50% over the entire pull by caching the
> keys in a set at the end of _buffer_all, then doing
> keys = keys.intersection(self._keys)
> instead.
>
I wonder if it doesn't just see it as .intersection(iterable) and not pay
attention to whether other is a dict/list/tuple/etc.
So it might special case against another set(), but not against any other type.
We certainly could investigate that for .difference_update(). But then again,
it would be another set() that you have to manage. And I'm guessing it wouldn't
be much faster than what we do now.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGqLDjJdeBCYSNAAMRAu7UAJwJZ8DRLToNdkRI2DT11f3AxXQoHQCeMobB
rjURDexx3LHbdiRAwSvOPDc=
=V21O
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list