Rev 4792: Some notes about memory consumption if we wrote it all in custom objects. in http://bazaar.launchpad.net/~jameinel/bzr/chk-index
John Arbash Meinel
john at arbash-meinel.com
Wed Oct 28 16:53:55 GMT 2009
At http://bazaar.launchpad.net/~jameinel/bzr/chk-index
------------------------------------------------------------
revno: 4792
revision-id: john at arbash-meinel.com-20091028165347-px6xmyjssmblo1om
parent: john at arbash-meinel.com-20091028162442-w9nyvpz5f7kbremd
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: chk-index
timestamp: Wed 2009-10-28 11:53:47 -0500
message:
Some notes about memory consumption if we wrote it all in custom objects.
-------------- next part --------------
=== modified file 'bzrlib/chk_index.py'
--- a/bzrlib/chk_index.py 2009-10-28 16:24:42 +0000
+++ b/bzrlib/chk_index.py 2009-10-28 16:53:47 +0000
@@ -368,6 +368,16 @@
raise ValueError('we only support entries with no references')
if key_elements != 1:
raise ValueError('we only support single-valued keys.')
+ # Note: We could create a significantly more optimized mapping here.
+ # Something like a Judy Tree/trie/RB tree, etc. It would save a
+ # fair amount of memory, and allow us to walk in sorted order.
+ # Also, a given 'node' record just needs 3 ints, for 'group num',
+ # in-group start, in-group length. Groups are never longer than
+ # 2/4GB and we'll never have >2B groups. This drops memory to
+ # 12bytes per entry + mapping overhead. Down from ~48. The dict
+ # itself is 12MB for the 400+k keys from launchpad + 44*400k for
+ # all the sha strings. A Judy Tree would make the mapping
+ # overhead <20*400k.?Or 8MB down from ~30MB.
self._nodes = {}
# map from group (start, end) to the final offset for the group
# the (0, 0) group is special, it indicates the empty content group
@@ -488,7 +498,9 @@
for idx, bit_key in enumerate(sorted(self._nodes)):
# TODO: We could probably just use bit_key rather than passing it
# into 'entry_coder'. This would save us from copying those
- # 20 bytes into the new string.
+ # 20 bytes into the new string. Though at a cost of a 4 bytes
+ # pointer in our output buffer. So not necessarily an instant
+ # win
bytes = self._entry_to_bytes(bit_key, header.entry_coder)
entry_bytes.append(bytes)
if have_mini_index:
More information about the bazaar-commits
mailing list