Memory profiling...

John Arbash Meinel john at arbash-meinel.com
Wed Jun 30 16:59:33 BST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Szakmeister wrote:
> I was told on IRC that John might know a few things I could do to get
> some memory profiling numbers from Bazaar.  Would you mind sharing?
> I've discovered in the past couple of weeks that files with moderate
> size and lots of changes really chew up memory when Bazaar decides to
> repack the repository.  I'd like to get more useful numbers on the
> actual memory consumption though.  While it's not the 6000 changes of
> the NEWS file, it may be several hundred of a single file or moderate
> size (1MB to 100MB), and it is binary data.  The answer he could be
> "don't do that with Bazaar" but I'd still like to find the limit.
> And, of course, it means we'll need to find an alternate solution for
> this too.
> 
> -John
> 
> 

Offhand I would say repack consumes on-the-order-of size(file)*6 or so
at peak. I have some thoughts about how to improve that, but never had
the time to poke at it.

Specifically, the internals of bzrlib/_groupcompress_pyx.pyx and the
diff-delta.c code.

We inherited some memory structures that were optimal for the original
use case, which I extended for ours, but they should really be
redesigned a bit. (The original use case the structure is built and then
never mutated and used a single source text. My use case has many source
texts, and the structure is updated with each one.)

It is a hash-table, and it should be layed out differently in memory. I
don't know if the repacking is worthwhile (though it does simplify some
comparison loops, etc).

It needs to be a bit more cautious about repeated input. It should do
fine for, say, all 0's, but any repeating pattern will cause
more-than-expected hash collisions, and cause poor performance (it will
fill up a given hash bucket causing resizing events to occur more often
than expected, etc.)

I'd be happy to give advice on how it could be changed if someone has
the time to work on it.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwraeUACgkQJdeBCYSNAANMkACfUDAbtBxLMcX9HNr3XEG4unKd
8eAAn2mTBHDJef1FTut3wOS+v81xuryo
=wsXO
-----END PGP SIGNATURE-----



More information about the bazaar mailing list