Merge eats memory

Fri May 27 03:54:37 BST 2005

Actually, it also works fine on big trees that have no changes. :-)

> I've just tried merging the last day or so's git commits, which is about 900K 
> of diff, and it's not happy.
> 
> Casual observation of top showed bzr using up to 700M of memory.

That's a rather shocking 777:1 ratio.  I'd have expected much closer to
a 1:1 ratio, because it does store a changeset in memory.

There are several places where I read an entire file into memory.  If
that's the cause, it'll probably be this line:

changeset.py:1444
            if file(full_path_a, "rb").read() == \
                file(full_path_b, "rb").read():

But as you can see, no reference to the file contents is retained, so it
could only be the cause if the heap allocator really stinks.  (Yes that
code is sloppy, but it's clear, too.  Easy to change if it's a problem.)

> I've heard 
> bad things about Python's heap allocator so it may not be your fault, but 
> we'll have to look into it at some point.

One quick trick is to use pathname/@ rather than pathname, so that you
get the basis tree, not the working tree.  There's some significant
optimization there, taking advantage of the fact that two files with the
same textid have the same contents.

Similar speedups can and will be done for working trees, now that the
stat cache is in place.  But they haven't, yet.

Anyhow, it would be useful to know whether the file comparisons are the
cause of the problem.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFClovt0F+nu1YWqI0RAtUjAJ9/n0Coqm0wuAmCbp7yz6r6y6E5sACfVHl/
PFEsX5pyaBccg4UqK65ynBA=
=Eq55
-----END PGP SIGNATURE-----