[MERGE] Make 'bzr co' more memory sensitive
Aaron Bentley
aaron at aaronbentley.com
Mon Oct 6 16:13:45 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> On the performance implications, I'd say that this fix should go
> in *even* if it makes the operation slower on smaller cases. We
> can't reasonably pretend it's a performance regression since for
> large projects the actual implementation is not viable.
I agree that this should go in.
We should pursue alternatives that don't hit performance as badly.
I pursued something like VersionedFiles a while ago, called Storage, and
sped up iter_files_bytes. I avoided exhausting memory by splitting the
request into "build_groups", i.e. a series of adjacent file texts to be
built, splitting on fulltexts. The build_groups were combined with a
limit on the resulting size.
http://code.aaronbentley.com/bzr/bzrrepo/fast-iter-files-bytes
It went like this:
+class Storage(object):
+ """Handle low-level repository data access"""
...
+ @classmethod
+ def split_build_groups(klass, build_map, nodes, max_group_size=100000):
+ """Split build groups up to avoid retrieving too much data at once
+
+ :param build_map: A dict mapping item_keys to their
build-requirements
+ :param nodes: A dict mapping item_keys to their index data
+ :param max_group_size: The maximum size that a merged group may be.
+ Results may exceed this, but only when their minimum size
+ exceeds this. The default was determined experimentally by
+ checking out bzr.dev
+ """
+ build_groups = klass._find_build_groups(build_map)
+ with_size = klass._groups_with_size(build_groups, nodes)
+ return klass._merge_groups_by_size(with_size, max_group_size)
+
+ @staticmethod
+ def _find_build_groups(build_map):
+ """Group build requirements into independent groups."""
+ reverse_build_map = {}
+ for item_key, dependencies in build_map.iteritems():
+ reverse_build_map.setdefault(item_key, [])
+ for dependency in dependencies:
+ reverse_build_map.setdefault(dependency,
[]).append(item_key)
+ build_groups = []
+ for item_key, reverse_dependencies in
reverse_build_map.iteritems():
+ if len(reverse_dependencies) > 0:
+ continue
+ group = set()
+ pending = [item_key]
+ while len(pending) > 0:
+ cur_item_key = pending.pop()
+ if cur_item_key in group:
+ continue
+ group.add(cur_item_key)
+ pending.extend(build_map[cur_item_key])
+ build_groups.append(group)
+ return build_groups
+
+ @staticmethod
+ def _merge_groups_by_size(group_size, max_size):
+ """Merge build groups as much as posible without exceeding
max_size.
+
+ Groups that already exceed max_size will be returned unchanged.
+ """
+ pending = list(group_size)
+ pending.reverse()
+ while len(pending) > 0:
+ out_group = set()
+ out_size = 0
+ while len(pending) > 0:
+ cur_group, cur_size = pending.pop()
+ if out_size == 0 or cur_size + out_size <= max_size:
+ out_group.update(cur_group)
+ out_size += cur_size
+ else:
+ pending.append((cur_group, cur_size))
+ break
+ yield out_group
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFI6isp0F+nu1YWqI0RAjMcAJ4nqOeH/YHfa4JdmYXM7dgjNbxqfgCfVDqS
h6V6TJJPq+dq6Nki7EibiMg=
=oPsd
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list