High memory consumption during push

John Arbash Meinel john at arbash-meinel.com
Tue Jul 1 15:39:11 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
| John Arbash Meinel wrote:
|> Andrew Bennetts wrote:
| [...]
|> | (Pdb) cruds = []
|> | (Pdb) xxx = [cruds.append(['a b ' * 10 for x in range(30000)]) for y
|> in range(10)]
| [...]
|> I'm not sure how you are measuring memory fragmentation with this.
|
| Because if I can allocate 25MB of objects without affecting the OS's
picture of
| my memory consumption, then I can infer that my process already had
space for
| those objects.
|
| [...]
|> Now, this is on python2.4, which could certainly account for any
|> differences. The big kick in the pants is that:
|>
|> |>> del xxx, crud, x, y
|> VmPeak: 69476 kB VmSize: 65644 kB VmRSS: 61356 kB VmData: 60640 kB
|>
|> So no memory is reclaimed. Which surprises me because I was using
|> strings. Now a 4-byte string * 10 * 30,000 * 10 =~ 11.4MB. So there does
|> seem to be a considerable amount of overhead, but at a minimum there is
|> 11MB of actual new strings that you are creating with your loop.
|>
|> (And gc.collect() returns 0).
|>
|> After deleting, starting over again, each ">>> xxx = ..." line adds
|> about 1MB  to VmRSS, without changing VmPeak.
|
| They are 40 byte strings, actually (note the * 10).  (I chose 'a b
'*10 as that
| seemed a likely way to ensure that CPython wouldn't cleverly allocate
just one
| string and reference it many times.)

Hence why it was 4-byte * 10 :) and *then* * 30000 and *10 more.


|
| Also, Python 2.4 will behave differently to 2.5.  From the “What's
New” doc for 2.5:
|
|     “Evan Jones's patch to obmalloc, first described in a talk at
PyCon DC 2005,
|     was applied. Python 2.4 allocated small objects in 256K-sized
arenas, but
|     never freed arenas. With this patch, Python will free arenas when
they're
|     empty. The net effect is that on some platforms, when you allocate
many
|     objects, Python's memory usage may actually drop when you delete
them and
|     the memory may be returned to the operating system. (Implemented
by Evan
|     Jones, and reworked by Tim Peters.)”
|
| FWIW, I was testing with 2.5.
|
| Regardless, whether memory is reclaimed doesn't really matter for what
I was
| trying to show (although happily I did see memory drop a little bit
here and
| there, but obviously not as much as it grows).  I wasn't looking for a
drop in
| memory consumption when allocating lots of strings.  I was looking to
see how
| many strings I could allocate without increasing the memory that
CPython takes
| from the OS, as a way to find out how much memory was wasted due to
| fragmentation.
|
| In the situation I was looking at (at the end of
_create_new_pack_from_packs) it
| turns out that Python was holding onto 25MB of memory that it wasn't
using for
| any objects.  The proof of that was that I could allocate 25MB of strings
| without causing a blip on the process memory stats reported by the OS.
|
| [...]
|> I think our high watermark is higher than just the size of all packs on
|> disk. At least, people have been trying to branch 600MB repositories and
|> failing. And most should at least have that much virtual mem.
|
| I think so too.  I'm not sure why, though.  Perhaps I should repeat
some of this
| analysis on a much larger repo.
|
| -Andrew.
|
|

Well, looking at the above statement, it seems that python is allocated
in 1/4 MB chunks. So in the worst case, fragmentation causes a single
tuple to consume 256KB of space. Which should be shown as the difference
between a heapy result and an OS result.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhqQY8ACgkQJdeBCYSNAAMb+gCgpEKNfTTdxzzcYgiHhnd1XPf1
v8MAnjt7EcnMBazQ+gCeI0l0MrjmHife
=0mIj
-----END PGP SIGNATURE-----



More information about the bazaar mailing list