Bazaar: Out of memory

Fri May 9 15:52:07 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
| James Henstridge wrote:
|
|> The .join() method is faster because it does less memory allocations
|> and copies than the for loop you list above.  It should allocate a
|> single string for the final concatenated string.  In contrast, the for
|> loop version does a string allocation and copy for every iteration (so
|> in the final iteration you'd expect it to have allocated roughly twice
|> the memory).
|
| Hmmm ...
|
| Launchpad shows the exact change I made in bzr-fastexport here:
| http://bazaar.launchpad.net/~bzr/bzr-fastimport/fastimport.dev/revision/72.
| The impact was *dramatic*: the memory shown by Gnome System Manager
| importing a repository (http://repo.or.cz/w/AutomatorExifMover.git) with
| a large binary file (5M) dropped from 275M to 42M. Maybe recent versions
| of Python are smarter now about string concatenation in loops? Or maybe
| the change itself is a red herring and we're seeing a symptom of a
| reference counting bug, say?

Seems a bit odd to me. I'm curious if you put in some "gc.collect()" statements
what would happen.

|
|> If this change prevents MemoryErrors then something weird is going on.
|
| In the case of bzr-fastimport, the code was calling read in a loop and
| joining that list. In the case of bzr itself, I'm pretty sure we
| arbitrarily partition huge binary files into "lines" based on where '\n'
| characters just happened to be. Given this is user data, that could be
| almost any size. (In Guido's case, I think his largest file is 40M FWIW.)
|
| Ian C.
|
|

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgkZRcACgkQJdeBCYSNAAN4xQCbBpzzRthzIxatloTrUp0Wevcc
ZsoAn19u9Wll62MhnQAxfS1WKcPgVcv2
=80V9
-----END PGP SIGNATURE-----