Rev 3895: Merge brisbane-core tip, resolve differences. in http://bzr.arbash-meinel.com/branches/bzr/brisbane/refcycles

Mon Mar 23 03:30:05 GMT 2009

At http://bzr.arbash-meinel.com/branches/bzr/brisbane/refcycles

------------------------------------------------------------
revno: 3895
revision-id: john at arbash-meinel.com-20090323032950-lmbrocu79l90dqn5
parent: john at arbash-meinel.com-20090320150205-kcmh70biyo76p0kn
parent: john at arbash-meinel.com-20090321032222-n2wbqe0ozhhizwxm
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: refcycles
timestamp: Sun 2009-03-22 22:29:50 -0500
message:
  Merge brisbane-core tip, resolve differences.
  Finish making various get_record_stream() calls clean up refcycles
  and memory consumption after yielding the record.
modified:
  bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
  bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
  bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 3893.1.3
    revision-id: john at arbash-meinel.com-20090321032222-n2wbqe0ozhhizwxm
    parent: john at arbash-meinel.com-20090320155300-2qdojs8r4loamvmw
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Fri 2009-03-20 22:22:22 -0500
    message:
      Fix a trivial typo
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 3893.1.2
    revision-id: john at arbash-meinel.com-20090320155300-2qdojs8r4loamvmw
    parent: john at arbash-meinel.com-20090320154310-q5ye037radsy052j
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Fri 2009-03-20 10:53:00 -0500
    message:
      Remove an isinstance(..., tuple) assertion.
      According to lsprof it was actually a bit expensive, and didn't help much anyway.
    modified:
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 3893.1.1
    revision-id: john at arbash-meinel.com-20090320154310-q5ye037radsy052j
    parent: john at arbash-meinel.com-20090320032107-bm9wg421rtcacy5i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Fri 2009-03-20 10:43:10 -0500
    message:
      Remove support for passing None for end in GroupCompressBlock.extract.
      
      I decided the removal of the extra int in wire-bytes and indices was not a worthy
      trade-off versus the ability to _prepare_for_extract and cheaply filter bytes
      during fetch. And it makes the code simpler/easier to maintain.
      
      Also, add support for having a 'empty content' record, which has start=end=0.
      Support costs very little, and simplifies things.
      And now GroupCompressBlock.extract() just returns the bytes. It doesn't try to
      sha the content, nor does it return a GCBEntry. We weren't using it anyway.
      And it can save ~50 seconds of sha-ing all the content during 'bzr pack' of
      a launchpad branch.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
-------------- next part --------------
=== modified file 'bzrlib/groupcompress.py'

--- a/bzrlib/groupcompress.py	2009-03-20 15:02:05 +0000
+++ b/bzrlib/groupcompress.py	2009-03-23 03:29:50 +0000
@@ -223,7 +223,7 @@
                 self._content = ''
             elif self._compressor_name == 'lzma':
                 # We don't do partial lzma decomp yet
-                self._content = pylma.decompress(self._z_content)
+                self._content = pylzma.decompress(self._z_content)
             else:
                 # Start a zlib decompressor
                 assert self._compressor_name == 'zlib'
@@ -340,21 +340,8 @@
         :return: The bytes for the content
         """
         if start == end == 0:
-            return None, ''
-        # Make sure we have enough bytes for this record
-        # TODO: if we didn't want to track the end of this entry, we could
-        #       _ensure_content(start+enough_bytes_for_type_and_length), and
-        #       then decode the entry length, and
-        #       _ensure_content(start+1+length)
-        #       It is 2 calls to _ensure_content(), but we always buffer a bit
-        #       extra anyway, and it means 1 less offset stored in the index,
-        #       and transmitted over the wire
-        if end is None:
-            # it takes 5 bytes to encode 2^32, so we need 1 byte to hold the
-            # 'f' or 'd' declaration, and then 5 more for the record length.
-            self._ensure_content(start + 6)
-        else:
-            self._ensure_content(end)
+            return ''
+        self._ensure_content(end)
         # The bytes are 'f' or 'd' for the type, then a variable-length
         # base128 integer for the content size, then the actual content
         # We know that the variable-length integer won't be longer than 5
@@ -370,23 +357,15 @@
         content_len, len_len = decode_base128_int(
                             self._content[start + 1:start + 6])
         content_start = start + 1 + len_len
-        if end is None:
-            end = content_start + content_len
-            self._ensure_content(end)
-        else:
-            if end != content_start + content_len:
-                raise ValueError('end != len according to field header'
-                    ' %s != %s' % (end, content_start + content_len))
-        entry = GroupCompressBlockEntry(key, type, sha1=None,
-                                        start=start, length=end-start)
+        if end != content_start + content_len:
+            raise ValueError('end != len according to field header'
+                ' %s != %s' % (end, content_start + content_len))
         content = self._content[content_start:end]
         if c == 'f':
             bytes = content
         elif c == 'd':
             bytes = _groupcompress_pyx.apply_delta(self._content, content)
-        if entry.sha1 is None:
-            entry.sha1 = sha_string(bytes)
-        return entry, bytes
+        return bytes
 
     def add_entry(self, key, type, sha1, start, length):
         """Add new meta info about an entry.
@@ -521,8 +500,7 @@
                 # Grab the raw bytes for this entry, and break the ref-cycle
                 self._manager._prepare_for_extract()
                 block = self._manager._block
-                _, bytes = block.extract(self.key, self._start, self._end)
-                self._bytes = bytes
+                self._bytes = block.extract(self.key, self._start, self._end)
                 self._manager = None
             if storage_kind == 'fulltext':
                 return self._bytes
@@ -556,6 +534,8 @@
         """Get a record for all keys added so far."""
         for factory in self._factories:
             yield factory
+            factory._bytes = None
+            factory._manager = None
         # TODO: Consider setting self._factories = None after the above loop,
         #       as it will break the reference cycle
 
@@ -1325,6 +1305,11 @@
                             # Yield everything buffered so far
                             for factory in manager.get_record_stream():
                                 yield factory
+                                # Disable this record, breaks the refcycle, and
+                                # saves memory. But this means clients really
+                                # *cannot* hang on to objects.
+                                factory._bytes = None
+                                factory._manager = None
                             manager = None
                         bytes, sha1 = self._compressor.extract(key)
                         parents = self._unadded_refs[key]

=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- a/bzrlib/repofmt/groupcompress_repo.py	2009-03-20 15:02:05 +0000
+++ b/bzrlib/repofmt/groupcompress_repo.py	2009-03-23 03:29:50 +0000
@@ -258,9 +258,6 @@
                 next_keys = set()
                 def handle_internal_node(node):
                     for prefix, value in node._items.iteritems():
-                        if not isinstance(value, tuple):
-                            raise AssertionError("value is %s when a tuple"
-                                " is expected" % (value.__class__))
                         # We don't want to request the same key twice, and we
                         # want to order it by the first time it is seen.
                         # Even further, we don't want to request a key which is
@@ -294,13 +291,6 @@
                             handle_internal_node(node)
                         elif parse_leaf_nodes:
                             handle_leaf_node(node)
-                        # XXX: We don't walk the chk map to determine
-                        #      referenced (file_id, revision_id) keys.
-                        #      We don't do it yet because you really need to
-                        #      filter out the ones that are present in the
-                        #      parents of the rev just before the ones you are
-                        #      copying, otherwise the filter is grabbing too
-                        #      many keys...
                         counter[0] += 1
                         if pb is not None:
                             pb.update('chk node', counter[0], total_keys)

=== modified file 'bzrlib/tests/test_groupcompress.py'
--- a/bzrlib/tests/test_groupcompress.py	2009-03-19 03:06:02 +0000
+++ b/bzrlib/tests/test_groupcompress.py	2009-03-20 15:43:10 +0000
@@ -329,21 +329,6 @@
                              'length:100\n'
                              '\n', raw_bytes)
 
-    def test_extract_no_end(self):
-        # We should be able to extract a record, even if we only know the start
-        # of the bytes.
-        texts = {
-            ('key1',): 'text for key1\nhas bytes that are common\n',
-            ('key2',): 'text for key2\nhas bytes that are common\n',
-        }
-        entries, block = self.make_block(texts)
-        self.assertEqualDiff('text for key1\nhas bytes that are common\n',
-                             block.extract(('key1',), entries[('key1',)].start,
-                                           end=None)[1])
-        self.assertEqualDiff('text for key2\nhas bytes that are common\n',
-                             block.extract(('key2',), entries[('key2',)].start,
-                                           end=None)[1])
-
     def test_partial_decomp(self):
         content_chunks = []
         # We need a sufficient amount of data so that zlib.decompress has