High memory consumption during push

Andrew Bennetts andrew at canonical.com
Tue Jul 1 03:27:41 BST 2008


John Arbash Meinel wrote:
> Andrew Bennetts wrote:
[...]
> | (Pdb) cruds = []
> | (Pdb) xxx = [cruds.append(['a b ' * 10 for x in range(30000)]) for y
> in range(10)]
[...]
>
> I'm not sure how you are measuring memory fragmentation with this.

Because if I can allocate 25MB of objects without affecting the OS's picture of
my memory consumption, then I can infer that my process already had space for
those objects.

[...]
> Now, this is on python2.4, which could certainly account for any
> differences. The big kick in the pants is that:
>
> |>> del xxx, crud, x, y
> VmPeak: 69476 kB VmSize: 65644 kB VmRSS: 61356 kB VmData: 60640 kB
>
> So no memory is reclaimed. Which surprises me because I was using
> strings. Now a 4-byte string * 10 * 30,000 * 10 =~ 11.4MB. So there does
> seem to be a considerable amount of overhead, but at a minimum there is
> 11MB of actual new strings that you are creating with your loop.
>
> (And gc.collect() returns 0).
>
> After deleting, starting over again, each ">>> xxx = ..." line adds
> about 1MB  to VmRSS, without changing VmPeak.

They are 40 byte strings, actually (note the * 10).  (I chose 'a b '*10 as that
seemed a likely way to ensure that CPython wouldn't cleverly allocate just one
string and reference it many times.)

Also, Python 2.4 will behave differently to 2.5.  From the “What's New” doc for 2.5:

    “Evan Jones's patch to obmalloc, first described in a talk at PyCon DC 2005,
    was applied. Python 2.4 allocated small objects in 256K-sized arenas, but
    never freed arenas. With this patch, Python will free arenas when they're
    empty. The net effect is that on some platforms, when you allocate many
    objects, Python's memory usage may actually drop when you delete them and
    the memory may be returned to the operating system. (Implemented by Evan
    Jones, and reworked by Tim Peters.)”

FWIW, I was testing with 2.5.

Regardless, whether memory is reclaimed doesn't really matter for what I was
trying to show (although happily I did see memory drop a little bit here and
there, but obviously not as much as it grows).  I wasn't looking for a drop in
memory consumption when allocating lots of strings.  I was looking to see how
many strings I could allocate without increasing the memory that CPython takes
from the OS, as a way to find out how much memory was wasted due to
fragmentation.

In the situation I was looking at (at the end of _create_new_pack_from_packs) it
turns out that Python was holding onto 25MB of memory that it wasn't using for
any objects.  The proof of that was that I could allocate 25MB of strings
without causing a blip on the process memory stats reported by the OS.

[...]
>
> I think our high watermark is higher than just the size of all packs on
> disk. At least, people have been trying to branch 600MB repositories and
> failing. And most should at least have that much virtual mem.

I think so too.  I'm not sure why, though.  Perhaps I should repeat some of this
analysis on a much larger repo.

-Andrew.




More information about the bazaar mailing list