Bazaar: Out of memory

Fri May 9 15:48:38 BST 2008

James Henstridge wrote:

> The .join() method is faster because it does less memory allocations
> and copies than the for loop you list above.  It should allocate a
> single string for the final concatenated string.  In contrast, the for
> loop version does a string allocation and copy for every iteration (so
> in the final iteration you'd expect it to have allocated roughly twice
> the memory).

Hmmm ...

Launchpad shows the exact change I made in bzr-fastexport here:
http://bazaar.launchpad.net/~bzr/bzr-fastimport/fastimport.dev/revision/72.
The impact was *dramatic*: the memory shown by Gnome System Manager
importing a repository (http://repo.or.cz/w/AutomatorExifMover.git) with
a large binary file (5M) dropped from 275M to 42M. Maybe recent versions
of Python are smarter now about string concatenation in loops? Or maybe
the change itself is a red herring and we're seeing a symptom of a
reference counting bug, say?

> If this change prevents MemoryErrors then something weird is going on.

In the case of bzr-fastimport, the code was calling read in a loop and
joining that list. In the case of bzr itself, I'm pretty sure we
arbitrarily partition huge binary files into "lines" based on where '\n'
characters just happened to be. Given this is user data, that could be
almost any size. (In Guido's case, I think his largest file is 40M FWIW.)

Ian C.