Merge eats memory
Aaron Bentley
aaron.bentley at utoronto.ca
Fri May 27 03:54:37 BST 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Michael Ellerman wrote:
> Hey Aaron,
>
> I haven't had a chance to look at your merge code yet, but it works nicely on
> small trees. A loaded compliment I know :D
Actually, it also works fine on big trees that have no changes. :-)
> I've just tried merging the last day or so's git commits, which is about 900K
> of diff, and it's not happy.
>
> Casual observation of top showed bzr using up to 700M of memory.
That's a rather shocking 777:1 ratio. I'd have expected much closer to
a 1:1 ratio, because it does store a changeset in memory.
There are several places where I read an entire file into memory. If
that's the cause, it'll probably be this line:
changeset.py:1444
if file(full_path_a, "rb").read() == \
file(full_path_b, "rb").read():
But as you can see, no reference to the file contents is retained, so it
could only be the cause if the heap allocator really stinks. (Yes that
code is sloppy, but it's clear, too. Easy to change if it's a problem.)
> I've heard
> bad things about Python's heap allocator so it may not be your fault, but
> we'll have to look into it at some point.
One quick trick is to use pathname/@ rather than pathname, so that you
get the basis tree, not the working tree. There's some significant
optimization there, taking advantage of the fact that two files with the
same textid have the same contents.
Similar speedups can and will be done for working trees, now that the
stat cache is in place. But they haven't, yet.
Anyhow, it would be useful to know whether the file comparisons are the
cause of the problem.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFClovt0F+nu1YWqI0RAtUjAJ9/n0Coqm0wuAmCbp7yz6r6y6E5sACfVHl/
PFEsX5pyaBccg4UqK65ynBA=
=Eq55
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list