High memory consumption during push
John Arbash Meinel
john at arbash-meinel.com
Tue Jul 1 02:55:51 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote:
...
| (Pdb) cruds = []
| (Pdb) xxx = [cruds.append(['a b ' * 10 for x in range(30000)]) for y
in range(10)]
|
| No memory increase. Do it again.
|
| (Pdb) xxx = [cruds.append(['a b ' * 10 for x in range(30000)]) for y
in range(10)]
|
| 15MB increase! Do it twice more:
|
| (Pdb) xxx = [cruds.append(['a b ' * 10 for x in range(30000)]) for y
in range(10)]
| (Pdb) xxx = [cruds.append(['a b ' * 10 for x in range(30000)]) for y
in range(10)]
|
| 40MB increase. And each subsequent pair of these adds another 40.
But the
| first pair only added 15. Thus there's at least 25MB of memory wasted,
| presumably due to fragementation.
|
I'm not sure how you are measuring memory fragmentation with this.
xxx = [None] * 10 when it is done
and len(cruds) == 40, len(cruds[0]) = 30000.
I'll use a different method, specifically "grep "^Vm..." /proc/PID/status:
|>> cruds = []
VmPeak: 6268 kB VmSize: 6044 kB VmRSS: 2732 kB VmData: 1040 kB
|>> xxx = [cruds.append(['a b ' * 10 for x in range(30000)]) for y in
range(10)]
VmPeak: 27516 kB VmSize: 27516 kB VmRSS: 23732 kB VmData: 22512 kB
So the first run adds 21MB to Peak, RSS and Data
|>> xxx = ...
VmPeak: 48476 kB VmSize: 48476 kB VmRSS: 44400 kB VmData: 43472 kB
Another 21MB
|>> xxx = ...
VmPeak: 69476 kB VmSize: 69476 kB VmRSS: 64976 kB VmData: 64472 kB
Another 21MB
Now, this is on python2.4, which could certainly account for any
differences. The big kick in the pants is that:
|>> del xxx, crud, x, y
VmPeak: 69476 kB VmSize: 65644 kB VmRSS: 61356 kB VmData: 60640 kB
So no memory is reclaimed. Which surprises me because I was using
strings. Now a 4-byte string * 10 * 30,000 * 10 =~ 11.4MB. So there does
seem to be a considerable amount of overhead, but at a minimum there is
11MB of actual new strings that you are creating with your loop.
(And gc.collect() returns 0).
After deleting, starting over again, each ">>> xxx = ..." line adds
about 1MB to VmRSS, without changing VmPeak.
| The good news then is that presumably a long-lived bzrlib process
(like the
| smart server) might have high memory consumption, but it should at least
| stabilise rather than grow indefinitely... :/
|
| What to do?
| -----------
|
| First, we need to be aware that fragmentation is a serious confounding
factor in
| measuring memory use. The number and size of allocated objects is not
| necessarily related to the memory consumption seen by the OS. Presumably
| allocation patterns of our code and internal details of CPython (which
change
| from release to release) will have an impact on the memory consumption
seen by
| the OS. So naïve benchmarking of memory highwater marks might point us in
| confusing directions. Probably whenever we write tools that report on the
| memory consumption reported by the OS, we should also report the
corresponding
| value from heapy, e.g. "from guppy import hpy; print hpy().heap().size".
|
| We should also think about what we can do to avoid memory allocation
patterns
| that cause such sparse fragementation. Pure python unfortunately
doesn't give us any
| way to allocate an object in a different memory arena. Perhaps this
is a reason
| to move more code into C, such as GraphIndex? Another possibility
*might* be to
| mmap pack files rather than reading them into strings.
|
| I think we probably also want to merge my string interning one-liner :)
|
| Other than that, I'm not sure what we can do. Perhaps we need to be
trying
| harder to keep the high watermark down in the first place? Ideas are
welcome.
|
| -Andrew.
I think our high watermark is higher than just the size of all packs on
disk. At least, people have been trying to branch 600MB repositories and
failing. And most should at least have that much virtual mem.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhpjqcACgkQJdeBCYSNAAMsOwCfYd9G55aTMD6Lr/rP2XrTnoqq
AQ0An1jYyNeWtmlyOmRz6UrCvTqjtpe/
=Lc4B
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list