Rev 3895: Merge brisbane-core tip, resolve differences. in http://bzr.arbash-meinel.com/branches/bzr/brisbane/refcycles
John Arbash Meinel
john at arbash-meinel.com
Mon Mar 23 03:30:05 GMT 2009
At http://bzr.arbash-meinel.com/branches/bzr/brisbane/refcycles
------------------------------------------------------------
revno: 3895
revision-id: john at arbash-meinel.com-20090323032950-lmbrocu79l90dqn5
parent: john at arbash-meinel.com-20090320150205-kcmh70biyo76p0kn
parent: john at arbash-meinel.com-20090321032222-n2wbqe0ozhhizwxm
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: refcycles
timestamp: Sun 2009-03-22 22:29:50 -0500
message:
Merge brisbane-core tip, resolve differences.
Finish making various get_record_stream() calls clean up refcycles
and memory consumption after yielding the record.
modified:
bzrlib/groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 3893.1.3
revision-id: john at arbash-meinel.com-20090321032222-n2wbqe0ozhhizwxm
parent: john at arbash-meinel.com-20090320155300-2qdojs8r4loamvmw
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: brisbane-core
timestamp: Fri 2009-03-20 22:22:22 -0500
message:
Fix a trivial typo
modified:
bzrlib/groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 3893.1.2
revision-id: john at arbash-meinel.com-20090320155300-2qdojs8r4loamvmw
parent: john at arbash-meinel.com-20090320154310-q5ye037radsy052j
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: brisbane-core
timestamp: Fri 2009-03-20 10:53:00 -0500
message:
Remove an isinstance(..., tuple) assertion.
According to lsprof it was actually a bit expensive, and didn't help much anyway.
modified:
bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
------------------------------------------------------------
revno: 3893.1.1
revision-id: john at arbash-meinel.com-20090320154310-q5ye037radsy052j
parent: john at arbash-meinel.com-20090320032107-bm9wg421rtcacy5i
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: brisbane-core
timestamp: Fri 2009-03-20 10:43:10 -0500
message:
Remove support for passing None for end in GroupCompressBlock.extract.
I decided the removal of the extra int in wire-bytes and indices was not a worthy
trade-off versus the ability to _prepare_for_extract and cheaply filter bytes
during fetch. And it makes the code simpler/easier to maintain.
Also, add support for having a 'empty content' record, which has start=end=0.
Support costs very little, and simplifies things.
And now GroupCompressBlock.extract() just returns the bytes. It doesn't try to
sha the content, nor does it return a GCBEntry. We weren't using it anyway.
And it can save ~50 seconds of sha-ing all the content during 'bzr pack' of
a launchpad branch.
modified:
bzrlib/groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
-------------- next part --------------
=== modified file 'bzrlib/groupcompress.py'
--- a/bzrlib/groupcompress.py 2009-03-20 15:02:05 +0000
+++ b/bzrlib/groupcompress.py 2009-03-23 03:29:50 +0000
@@ -223,7 +223,7 @@
self._content = ''
elif self._compressor_name == 'lzma':
# We don't do partial lzma decomp yet
- self._content = pylma.decompress(self._z_content)
+ self._content = pylzma.decompress(self._z_content)
else:
# Start a zlib decompressor
assert self._compressor_name == 'zlib'
@@ -340,21 +340,8 @@
:return: The bytes for the content
"""
if start == end == 0:
- return None, ''
- # Make sure we have enough bytes for this record
- # TODO: if we didn't want to track the end of this entry, we could
- # _ensure_content(start+enough_bytes_for_type_and_length), and
- # then decode the entry length, and
- # _ensure_content(start+1+length)
- # It is 2 calls to _ensure_content(), but we always buffer a bit
- # extra anyway, and it means 1 less offset stored in the index,
- # and transmitted over the wire
- if end is None:
- # it takes 5 bytes to encode 2^32, so we need 1 byte to hold the
- # 'f' or 'd' declaration, and then 5 more for the record length.
- self._ensure_content(start + 6)
- else:
- self._ensure_content(end)
+ return ''
+ self._ensure_content(end)
# The bytes are 'f' or 'd' for the type, then a variable-length
# base128 integer for the content size, then the actual content
# We know that the variable-length integer won't be longer than 5
@@ -370,23 +357,15 @@
content_len, len_len = decode_base128_int(
self._content[start + 1:start + 6])
content_start = start + 1 + len_len
- if end is None:
- end = content_start + content_len
- self._ensure_content(end)
- else:
- if end != content_start + content_len:
- raise ValueError('end != len according to field header'
- ' %s != %s' % (end, content_start + content_len))
- entry = GroupCompressBlockEntry(key, type, sha1=None,
- start=start, length=end-start)
+ if end != content_start + content_len:
+ raise ValueError('end != len according to field header'
+ ' %s != %s' % (end, content_start + content_len))
content = self._content[content_start:end]
if c == 'f':
bytes = content
elif c == 'd':
bytes = _groupcompress_pyx.apply_delta(self._content, content)
- if entry.sha1 is None:
- entry.sha1 = sha_string(bytes)
- return entry, bytes
+ return bytes
def add_entry(self, key, type, sha1, start, length):
"""Add new meta info about an entry.
@@ -521,8 +500,7 @@
# Grab the raw bytes for this entry, and break the ref-cycle
self._manager._prepare_for_extract()
block = self._manager._block
- _, bytes = block.extract(self.key, self._start, self._end)
- self._bytes = bytes
+ self._bytes = block.extract(self.key, self._start, self._end)
self._manager = None
if storage_kind == 'fulltext':
return self._bytes
@@ -556,6 +534,8 @@
"""Get a record for all keys added so far."""
for factory in self._factories:
yield factory
+ factory._bytes = None
+ factory._manager = None
# TODO: Consider setting self._factories = None after the above loop,
# as it will break the reference cycle
@@ -1325,6 +1305,11 @@
# Yield everything buffered so far
for factory in manager.get_record_stream():
yield factory
+ # Disable this record, breaks the refcycle, and
+ # saves memory. But this means clients really
+ # *cannot* hang on to objects.
+ factory._bytes = None
+ factory._manager = None
manager = None
bytes, sha1 = self._compressor.extract(key)
parents = self._unadded_refs[key]
=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- a/bzrlib/repofmt/groupcompress_repo.py 2009-03-20 15:02:05 +0000
+++ b/bzrlib/repofmt/groupcompress_repo.py 2009-03-23 03:29:50 +0000
@@ -258,9 +258,6 @@
next_keys = set()
def handle_internal_node(node):
for prefix, value in node._items.iteritems():
- if not isinstance(value, tuple):
- raise AssertionError("value is %s when a tuple"
- " is expected" % (value.__class__))
# We don't want to request the same key twice, and we
# want to order it by the first time it is seen.
# Even further, we don't want to request a key which is
@@ -294,13 +291,6 @@
handle_internal_node(node)
elif parse_leaf_nodes:
handle_leaf_node(node)
- # XXX: We don't walk the chk map to determine
- # referenced (file_id, revision_id) keys.
- # We don't do it yet because you really need to
- # filter out the ones that are present in the
- # parents of the rev just before the ones you are
- # copying, otherwise the filter is grabbing too
- # many keys...
counter[0] += 1
if pb is not None:
pb.update('chk node', counter[0], total_keys)
=== modified file 'bzrlib/tests/test_groupcompress.py'
--- a/bzrlib/tests/test_groupcompress.py 2009-03-19 03:06:02 +0000
+++ b/bzrlib/tests/test_groupcompress.py 2009-03-20 15:43:10 +0000
@@ -329,21 +329,6 @@
'length:100\n'
'\n', raw_bytes)
- def test_extract_no_end(self):
- # We should be able to extract a record, even if we only know the start
- # of the bytes.
- texts = {
- ('key1',): 'text for key1\nhas bytes that are common\n',
- ('key2',): 'text for key2\nhas bytes that are common\n',
- }
- entries, block = self.make_block(texts)
- self.assertEqualDiff('text for key1\nhas bytes that are common\n',
- block.extract(('key1',), entries[('key1',)].start,
- end=None)[1])
- self.assertEqualDiff('text for key2\nhas bytes that are common\n',
- block.extract(('key2',), entries[('key2',)].start,
- end=None)[1])
-
def test_partial_decomp(self):
content_chunks = []
# We need a sufficient amount of data so that zlib.decompress has
More information about the bazaar-commits
mailing list