[PERFORMANCE] set operations with dicts _slow_
Robert Collins
robertc at robertcollins.net
Thu Jul 26 06:41:31 BST 2007
So I've got the repository branch I'm working on in the same ballpark as
regular repositories (with the C extensions disabled ;)).
Interestingly the primary time cost was from this code:
def iter_entries(self, keys):
"""Iterate over keys within the index.
:param keys: An iterable providing the keys to be retrieved.
:return: An iterable as per iter_all_entries, but restricted to the
keys supplied. No additional keys will be returned, and every
key supplied that is in the index will be returned.
"""
keys = set(keys)
if not keys:
return
if self._nodes is None:
self._buffer_all()
keys = keys.intersection(self._nodes)
if self.node_ref_lists:
for key in keys:
value, node_refs = self._nodes[key]
yield key, value, node_refs
else:
for key in keys:
yield key, self._nodes[key]
Now, self._nodes is a dict. So keys.intersection(self._nodes) should be
fast right?
Well, I had assumed that it would be. But it turns out to be
pathologically slow. I saved 50% over the entire pull by caching the
keys in a set at the end of _buffer_all, then doing
keys = keys.intersection(self._keys)
instead.
So I think the reason difference_update was slow, which John caught, was
that it was against a dict, not another set; and maintaining a parallel
set may actually be faster still if difference_update is reinstated.
-Rob
robertc at lifelesslap:~/source/baz/test-repos/experimental$
time ../../repository/bzr pull -r 2001 ../../bzr.dev
No revisions to pull.
real 0m0.962s
user 0m0.784s
sys 0m0.132s
robertc at lifelesslap:~/source/baz/test-repos/experimental$ cd ../regular/
robertc at lifelesslap:~/source/baz/test-repos/regular$
time ../../repository/bzr pull -r 2001 ../../bzr.dev
No revisions to pull.
real 0m0.957s
user 0m0.496s
sys 0m0.176s
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070726/316615ad/attachment.pgp
More information about the bazaar
mailing list