[RFC] Reduce memory with Key and Keys (and improve performance)

Thu Sep 10 22:01:01 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So I've got a basic prototype of a custom object for holding our Key
representations and our Keys (list/tuple of keys). It is available from
here:
  lp:///~jameinel/bzr/2.1-memory-consumption

The basic observation was that we have a lot of space consumed by
tuples. Consider the case of the 'revisions' graph.

For every revision_id (string approx 40-60 bytes long), you end up with:
1) A python PyString object, costing ~24 bytes
2) A tuple wrapping it, turning it into a revision_key costing 32 bytes.
   The basic tuple seems to be 12 bytes +4 for the reference, but you
   have to add a GC header. I thought the GC header was only 12 bytes,
   but it may have grown (or they changed alignment) for py 2.6 (it now
   seems to be 16 bytes)

When you have a parents list, that adds another:
3) Tuple wrapping it costing 28bytes + 4 for each reference

We also internally have our 'ref_lists' structure, which adds one more
layer of tuples, because you can have an arbitrary number of ref lists.
(though in the case of 2a, we only have 1 ref list, so it adds:
4) Tuple wrapping it costing 28bytes + 4 for each reference

So if you have a key with a single parent, the parents list ends up
costing you 32+32=64 bytes in tuples and 24+50=74 bytes in the actual
string.

So I implemented:

1) Key(), basically a tuple that only supports a string object, and
   doesn't participate in the python garbage collector.

2) Keys() a collection of keys, such that they are stored as a simple
   list of strings, rather than a tuple of tuples (or a list of tuples).
   Note that this is also still 'tuple like' as it is immutable.

   The idea is that instead of having 1 object per key, you can have one
   object per group of keys.

   I also chose to overload a single 32-bit integer with length + width
   + flags, which means that the Keys object is quite a bit lighter, at
   the expense of only supporting keys with 256 entries and a list of
   keys no longer than 256.

   However, we don't have any parents list above what... maybe 8, and
   our max key width is 2, so I figured there was plenty of head room.

Results:

This was measured by reading all of the keys from all of revisions,
signatures, inventories, chk_bytes, texts in a Launchpad repository. And
then seeing what the peak memory consumption was at the end.

bzr.dev:
  WorkingSize  292744KB   PeakWorking  325584KB   finishing
  real    0m55.271s

bzr-with-Keys:
  WorkingSize  254552KB   PeakWorking  287340KB   finishing
  real    0m34.788s

So the net difference is about 37MiB peak and current (1.13:1). What was
even more surprising for *me* was the improvement in 'real time'. 1.6:1
My guess is that by not having these 900k object participating in the
garbage collector, we don't:
  1) Have as many garbage checks running. (IIRC gc only runs after so
     many gc objects have been allocated without being deallocated. But
     our objects aren't allocated as part of the gc set.)
  2) A given cycle check is now cheaper, because it doesn't have to walk
     the 900k objects that won't be a part of a cycle anyway.

Given the 10%-ish memory reduction, and the fairly impressive speed
improvement, I'm thinking it is probably worth investigating this further.

John
=:->

PS> The current objects aren't very optimized for more than just size.
For a lot of their functionality they create the corresponding tuple,
and then run the operation on that. (hash, repr, richcompare, etc.)
Also, the inner BTreeLeafParser.extract_keys still allocates a tuple,
and then passes that into the Key constructor, etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqpaQ0ACgkQJdeBCYSNAANAEACePgi6GBboQtAP7lnn6RmXvZaM
kWAAni70M1cy2PwbIBmg+bogL32kFr7w
=Ny0/
-----END PGP SIGNATURE-----