Rev 3887: Bring in brisbane-core 3196 in http://bzr.arbash-meinel.com/branches/bzr/brisbane/hack3

Fri Mar 27 04:33:50 GMT 2009

At http://bzr.arbash-meinel.com/branches/bzr/brisbane/hack3

------------------------------------------------------------
revno: 3887
revision-id: john at arbash-meinel.com-20090327042407-o04qojo548gekl0c
parent: john at arbash-meinel.com-20090326195751-slpkjxd39uqlewki
parent: john at arbash-meinel.com-20090327040528-88uc1za4ep2fj6gh
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: hack3
timestamp: Thu 2009-03-26 23:24:07 -0500
message:
  Bring in brisbane-core 3196
modified:
  bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
  bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
  bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3869.7.28
    revision-id: john at arbash-meinel.com-20090327040528-88uc1za4ep2fj6gh
    parent: john at arbash-meinel.com-20090327014543-9b216wm9q4olu3ib
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Thu 2009-03-26 23:05:28 -0500
    message:
      Set 'combine_backing_indices=False' as the default for text and chk indices.
      We may want them for something like commit according to Robert, though we have to
      be committing more than 100k new texts for it to matter, and really more than
      200k for it to trigger a combine. And it makes a very big difference
      to 'fetch' performance.
      
      Also, set random_id=True for 'insert_record_stream'. This makes another
      big win for fetch performance, though we may need to decide if it is
      genuinely safe.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3869.7.27
    revision-id: john at arbash-meinel.com-20090327014543-9b216wm9q4olu3ib
    parent: john at arbash-meinel.com-20090326201840-ddb2uqof335ysvnu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Thu 2009-03-26 20:45:43 -0500
    message:
      fix a bug in iter_interesting_nodes.
      
      If you have a leaf node as one of your CHK roots, it can get
      transmitted 2 times, if after a split you end up with the
      same content.
      Needs tests, though.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3869.7.26
    revision-id: john at arbash-meinel.com-20090326201840-ddb2uqof335ysvnu
    parent: john at arbash-meinel.com-20090326195952-w0qea66iw597ipza
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Thu 2009-03-26 15:18:40 -0500
    message:
      max() shows up under lsprof as more expensive than creating an object.
      timeit also says if x < y is faster than y = max(x, y).
      Small win, but I'll take it.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
-------------- next part --------------
=== modified file 'bzrlib/chk_map.py'

--- a/bzrlib/chk_map.py	2009-03-26 19:13:04 +0000
+++ b/bzrlib/chk_map.py	2009-03-27 01:45:43 +0000
@@ -1418,6 +1418,14 @@
     if records or interesting_items:
         yield records, interesting_items
     interesting_keys.difference_update(all_uninteresting_chks)
+    # TODO: We need a test for this
+    #       This handles the case where after a split, one of the child trees
+    #       is identical to one of the interesting root keys. Like if you had a
+    #       leaf node, with "aa" "ab", that then overflowed at "bb". You would
+    #       get a new internal node, but it would have one leaf node with
+    #       ("aa", "ab") and another leaf node with "bb". And you don't want to
+    #       re-transmit that ("aa", "ab") node again
+    all_uninteresting_chks.update(interesting_root_keys)
 
     chks_to_read = interesting_keys
     counter = 0

=== modified file 'bzrlib/groupcompress.py'
--- a/bzrlib/groupcompress.py	2009-03-26 19:57:51 +0000
+++ b/bzrlib/groupcompress.py	2009-03-27 04:24:07 +0000
@@ -538,7 +538,11 @@
         # Note that this creates a reference cycle....
         factory = _LazyGroupCompressFactory(key, parents, self,
             start, end, first=first)
-        self._last_byte = max(end, self._last_byte)
+        # max() works here, but as a function call, doing a compare seems to be
+        # significantly faster, timeit says 250ms for max() and 100ms for the
+        # comparison
+        if end > self._last_byte:
+            self._last_byte = end
         self._factories.append(factory)
 
     def get_record_stream(self):
@@ -1399,7 +1403,7 @@
         :return: None
         :seealso VersionedFiles.get_record_stream:
         """
-        for _ in self._insert_record_stream(stream):
+        for _ in self._insert_record_stream(stream, random_id=True):
             pass
 
     def _insert_record_stream(self, stream, random_id=False, nostore_sha=None,
@@ -1456,10 +1460,18 @@
         insert_manager = None
         block_start = None
         block_length = None
+        # XXX: TODO: remove this, it is just for safety checking for now
+        inserted_keys = set()
         for record in stream:
             # Raise an error when a record is missing.
             if record.storage_kind == 'absent':
                 raise errors.RevisionNotPresent(record.key, self)
+            if random_id:
+                if record.key in inserted_keys:
+                    trace.note('Insert claimed random_id=True, but then inserted'
+                               ' %r two times', record.key)
+                    continue
+                inserted_keys.add(record.key)
             if reuse_blocks:
                 # If the reuse_blocks flag is set, check to see if we can just
                 # copy a groupcompress block as-is.

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py	2009-03-26 19:59:52 +0000
+++ b/bzrlib/repofmt/pack_repo.py	2009-03-27 04:05:28 +0000
@@ -2000,12 +2000,14 @@
             self._new_pack)
         self.text_index.add_writable_index(self._new_pack.text_index,
             self._new_pack)
+        self._new_pack.text_index.set_optimize(combine_backing_indices=False)
         self.signature_index.add_writable_index(self._new_pack.signature_index,
             self._new_pack)
         if self.chk_index is not None:
             self.chk_index.add_writable_index(self._new_pack.chk_index,
                 self._new_pack)
             self.repo.chk_bytes._index._add_callback = self.chk_index.add_callback
+            self._new_pack.chk_index.set_optimize(combine_backing_indices=False)
 
         self.repo.inventories._index._add_callback = self.inventory_index.add_callback
         self.repo.revisions._index._add_callback = self.revision_index.add_callback