[MERGE] Make 'bzr co' more memory sensitive

Aaron Bentley aaron at aaronbentley.com
Mon Oct 6 16:13:45 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> On the performance implications, I'd say that this fix should go
> in *even* if it makes the operation slower on smaller cases. We
> can't reasonably pretend it's a performance regression since for
> large projects the actual implementation is not viable.

I agree that this should go in.

We should pursue alternatives that don't hit performance as badly.

I pursued something like VersionedFiles a while ago, called Storage, and
sped up iter_files_bytes.  I avoided exhausting memory by splitting the
request into "build_groups", i.e. a series of adjacent file texts to be
built, splitting on fulltexts.  The build_groups were combined with a
limit on the resulting size.

http://code.aaronbentley.com/bzr/bzrrepo/fast-iter-files-bytes

It went like this:
+class Storage(object):
+    """Handle low-level repository data access"""
...
+    @classmethod
+    def split_build_groups(klass, build_map, nodes, max_group_size=100000):
+        """Split build groups up to avoid retrieving too much data at once
+
+        :param build_map: A dict mapping item_keys to their
build-requirements
+        :param nodes: A dict mapping item_keys to their index data
+        :param max_group_size: The maximum size that a merged group may be.
+            Results may exceed this, but only when their minimum size
+            exceeds this.  The default was determined experimentally by
+            checking out bzr.dev
+        """
+        build_groups = klass._find_build_groups(build_map)
+        with_size = klass._groups_with_size(build_groups, nodes)
+        return klass._merge_groups_by_size(with_size, max_group_size)
+
+    @staticmethod
+    def _find_build_groups(build_map):
+        """Group build requirements into independent groups."""
+        reverse_build_map = {}
+        for item_key, dependencies in build_map.iteritems():
+            reverse_build_map.setdefault(item_key, [])
+            for dependency in dependencies:
+                reverse_build_map.setdefault(dependency,
[]).append(item_key)
+        build_groups = []
+        for item_key, reverse_dependencies in
reverse_build_map.iteritems():
+            if len(reverse_dependencies) > 0:
+                continue
+            group = set()
+            pending = [item_key]
+            while len(pending) > 0:
+                cur_item_key = pending.pop()
+                if cur_item_key in group:
+                    continue
+                group.add(cur_item_key)
+                pending.extend(build_map[cur_item_key])
+            build_groups.append(group)
+        return build_groups
+
+    @staticmethod
+    def _merge_groups_by_size(group_size, max_size):
+        """Merge build groups as much as posible without exceeding
max_size.
+
+        Groups that already exceed max_size will be returned unchanged.
+        """
+        pending = list(group_size)
+        pending.reverse()
+        while len(pending) > 0:
+            out_group = set()
+            out_size = 0
+            while len(pending) > 0:
+                cur_group, cur_size = pending.pop()
+                if out_size == 0 or cur_size + out_size <= max_size:
+                    out_group.update(cur_group)
+                    out_size += cur_size
+                else:
+                    pending.append((cur_group, cur_size))
+                    break
+            yield out_group


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI6isp0F+nu1YWqI0RAjMcAJ4nqOeH/YHfa4JdmYXM7dgjNbxqfgCfVDqS
h6V6TJJPq+dq6Nki7EibiMg=
=oPsd
-----END PGP SIGNATURE-----



More information about the bazaar mailing list