[PERFORMANCE] set operations with dicts _slow_

Robert Collins robertc at robertcollins.net
Thu Jul 26 06:41:31 BST 2007


So I've got the repository branch I'm working on in the same ballpark as
regular repositories (with the C extensions disabled ;)).

Interestingly the primary time cost was from this code:

def iter_entries(self, keys):
    """Iterate over keys within the index.

    :param keys: An iterable providing the keys to be retrieved.
    :return: An iterable as per iter_all_entries, but restricted to the
        keys supplied. No additional keys will be returned, and every
        key supplied that is in the index will be returned.
    """
    keys = set(keys)
    if not keys:
        return
    if self._nodes is None:
        self._buffer_all() 
    keys = keys.intersection(self._nodes)
    if self.node_ref_lists:
        for key in keys:
            value, node_refs = self._nodes[key]
            yield key, value, node_refs
    else:
        for key in keys:
            yield key, self._nodes[key]

Now, self._nodes is a dict. So keys.intersection(self._nodes) should be
fast right?

Well, I had assumed that it would be. But it turns out to be
pathologically slow. I saved 50% over the entire pull by caching the
keys in a set at the end of _buffer_all, then doing
    keys = keys.intersection(self._keys)
instead. 

So I think the reason difference_update was slow, which John caught, was
that it was against a dict, not another set; and maintaining a parallel
set may actually be faster still if difference_update is reinstated.

-Rob


robertc at lifelesslap:~/source/baz/test-repos/experimental$
time ../../repository/bzr pull -r 2001 ../../bzr.dev
No revisions to pull.

real    0m0.962s
user    0m0.784s
sys     0m0.132s
robertc at lifelesslap:~/source/baz/test-repos/experimental$ cd ../regular/
robertc at lifelesslap:~/source/baz/test-repos/regular$
time ../../repository/bzr pull -r 2001 ../../bzr.dev
No revisions to pull.

real    0m0.957s
user    0m0.496s
sys     0m0.176s

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070726/316615ad/attachment.pgp 


More information about the bazaar mailing list