Direction on faster iteritems

John Arbash Meinel john at arbash-meinel.com
Thu Mar 19 04:38:03 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Hi John,
> 
> I've been working on making iteritems faster following
> your email advice the other day. I'm not sure I'm on
> the right track though, e.g. perhaps I'm not returning
> the right information out of iternodes?
> 
> I started pushing my branch to lp 6-7 hours ago and it's
> *still* going so I'm sending you this patch to glance over
> when you get a chance. It's very much a Work In Progress
> but there's enough there to see how I've interpreted your
> email. :-)
> 
> Can you see anything I'm doing that's obviously wrong?
> 
> Ian C.
> 

     def __repr__(self):
- -        items_str = sorted(self._items)
+        items_str = str(sorted(self._items))
         if len(items_str) > 20:
- -            items_str = items_str[16] + '...]'
+            items_str = items_str[16:] + '...]'
         return \

^- I'm sure you want [:16] rather than [16:] :)

- -    def iteritems(self, store, key_filter=None):
+    def iteritems(self, store, key_filter=None, matched_key=None):

^- I'm not 100% sure what you are thinking to do with "matched_key". My
idea was to prune key_filter before calling LeafNode.iteritems() rather
than doing it as an extra parameter.

...

+            matched = {}
+            for node in self._iter_nodes(store, key_filter=key_filter,
+                matched_keys=matched):

^- My idea here was to change the return value for "_iter_nodes" so that
it would return:
for node, node_key_filter in self._iter_nodes(...):
  for item in node.iteritems(key_filter=node_key_filter):
    yield item


And then change the _iter_nodes matching code, so that it maps from the
'length_filter' which matched back to an original key. So you would have:

prefix_to_keys = {}
for key in key_filter:
  search_key = self._search_prefix_filter(key)
  try:
    length_filters[len(search_key)].add(length_filter)
  except KeyError:
    length_filters[len(search_key)] = set([search_key])
  try:
    prefix_to_keys[search_key].append(key)
  except KeyError:
    prefix_to_keys[search_key] = [key]

...

  node_key_filters = []
  for length, length_filter in length_filters:
    prefix_start = prefix[:length]
    if prefix_start in length_filter:
      node_key_filters.extend(prefix_to_keys[prefix_start])
  if node_key_filters:
    #queue up items that aren't nodes yet
    yield node, node_key_filters


And the other major change for LeafNode.iteritems()

narrow_key_filters = []
for key in key_filters:
  if len(key) == self._node_width:
    try:
      yield self._items[key]
    except KeyError:
      # that key doesn't exist here
      pass
  else:
    narrow_key_filters.append(key)
if narrow_key_filters:
  # This is stuff like (parent_id,) for a (parent_id, basename) map
  # Create another set of 'length' based filters, but this is based on
  # the key width, and not the length of the serialized form.
  ...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknBzCsACgkQJdeBCYSNAANYeQCfYBzAOQUa+enfhl75ffVXM9Y8
//kAn068o6Hcj7nVHuBSvCTg47RbRmJ8
=ivpu
-----END PGP SIGNATURE-----



More information about the bazaar mailing list