Rev 38: Change the extraction ordering for 'unordered'. in http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk
John Arbash Meinel
john at arbash-meinel.com
Thu Feb 19 18:25:16 GMT 2009
At http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk
------------------------------------------------------------
revno: 38
revision-id: john at arbash-meinel.com-20090219182442-waopk1hpwcidato4
parent: john at arbash-meinel.com-20090218221455-es3oeidved1pxfhj
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: trunk
timestamp: Thu 2009-02-19 12:24:42 -0600
message:
Change the extraction ordering for 'unordered'.
Instead of using a random ordering, use the ordering defined by
the index memos. This should give us the best group-locality.
This gives a rather large performance improvement. Like 30s versus 10min.
-------------- next part --------------
=== modified file 'groupcompress.py'
--- a/groupcompress.py 2009-02-18 22:14:55 +0000
+++ b/groupcompress.py 2009-02-19 18:24:42 +0000
@@ -469,24 +469,30 @@
ordering = 'unordered'
# Cheap: iterate
locations = self._index.get_build_details(keys)
+ local_keys = frozenset(keys).intersection(set(self._unadded_refs))
+ locations.update((key, None) for key in local_keys)
if ordering == 'topological':
# would be better to not globally sort initially but instead
# start with one key, recurse to its oldest parent, then grab
# everything in the same group, etc.
parent_map = dict((key, details[2]) for key, details in
locations.iteritems())
- local = frozenset(keys).intersection(set(self._unadded_refs))
- for key in local:
+ for key in local_keys:
parent_map[key] = self._unadded_refs[key]
- locations[key] = None
present_keys = topo_sort(parent_map)
# Now group by source:
+ # TODO: Support a group-compress 'optimal' ordering
else:
- present_keys = locations.keys()
- local = frozenset(keys).intersection(set(self._unadded_refs))
- for key in local:
- present_keys.append(key)
- locations[key] = None
+ # We want to yield the keys in a semi-optimal (read-wise) ordering.
+ # Otherwise we thrash the _group_cache and destroy performance
+ def get_group(key):
+ # This is the group the bytes are stored in
+ return locations[key][0:3]
+ present_keys = sorted(locations.iterkeys(), key=get_group)
+ # We don't have an ordering for keys in the in-memory object, but
+ # lets process the in-memory ones first.
+ local = list(local_keys)
+ present_keys = list(local_keys) + present_keys
absent_keys = keys.difference(set(locations))
for key in absent_keys:
yield AbsentContentFactory(key)
More information about the bazaar-commits
mailing list