Rev 3878: Bring in brisbane-core 3895 in http://bzr.arbash-meinel.com/branches/bzr/brisbane/hack3

Fri Mar 20 15:56:15 GMT 2009

At http://bzr.arbash-meinel.com/branches/bzr/brisbane/hack3

------------------------------------------------------------
revno: 3878
revision-id: john at arbash-meinel.com-20090320154811-znms4757w29gmc4b
parent: john at arbash-meinel.com-20090319194720-4esxj7gnrmfaykww
parent: john at arbash-meinel.com-20090320155300-2qdojs8r4loamvmw
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: hack3
timestamp: Fri 2009-03-20 10:48:11 -0500
message:
  Bring in brisbane-core 3895
modified:
  bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
  bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
  bzrlib/diff-delta.c            diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
  bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
  bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
  bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
  bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
  bzrlib/tests/inventory_implementations/basics.py basics.py-20070903044446-kdjwbiu1p1zi9phs-1
  bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 3869.7.7
    revision-id: john at arbash-meinel.com-20090320155300-2qdojs8r4loamvmw
    parent: john at arbash-meinel.com-20090320154310-q5ye037radsy052j
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Fri 2009-03-20 10:53:00 -0500
    message:
      Remove an isinstance(..., tuple) assertion.
      According to lsprof it was actually a bit expensive, and didn't help much anyway.
    modified:
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 3869.7.6
    revision-id: john at arbash-meinel.com-20090320154310-q5ye037radsy052j
    parent: john at arbash-meinel.com-20090320032107-bm9wg421rtcacy5i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Fri 2009-03-20 10:43:10 -0500
    message:
      Remove support for passing None for end in GroupCompressBlock.extract.
      
      I decided the removal of the extra int in wire-bytes and indices was not a worthy
      trade-off versus the ability to _prepare_for_extract and cheaply filter bytes
      during fetch. And it makes the code simpler/easier to maintain.
      
      Also, add support for having a 'empty content' record, which has start=end=0.
      Support costs very little, and simplifies things.
      And now GroupCompressBlock.extract() just returns the bytes. It doesn't try to
      sha the content, nor does it return a GCBEntry. We weren't using it anyway.
      And it can save ~50 seconds of sha-ing all the content during 'bzr pack' of
      a launchpad branch.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 3869.7.5
    revision-id: john at arbash-meinel.com-20090320032107-bm9wg421rtcacy5i
    parent: john at arbash-meinel.com-20090320031652-jjy97n2zsjq1ouxp
    parent: john at arbash-meinel.com-20090319233050-tf8ah6zasmeaetr0
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Thu 2009-03-19 22:21:07 -0500
    message:
      Merge the updates to the groupcompress DeltaIndex.
    modified:
      bzrlib/delta.h                 delta.h-20090227173129-qsu3u43vowf1q3ay-1
      bzrlib/diff-delta.c            diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      bzrlib/tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
        ------------------------------------------------------------
        revno: 3869.8.1
        revision-id: john at arbash-meinel.com-20090319233050-tf8ah6zasmeaetr0
        parent: john at arbash-meinel.com-20090319145132-e7eu3p75btuidhu2
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: gc_delta_index_room
        timestamp: Thu 2009-03-19 18:30:50 -0500
        message:
          *grow* the local hmask if it is smaller than expected, don't *shrink* it.
        modified:
          bzrlib/diff-delta.c            diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 3869.7.4
    revision-id: john at arbash-meinel.com-20090320031652-jjy97n2zsjq1ouxp
    parent: ian.clatworthy at canonical.com-20090320015656-xrypfxtcwk0poi4z
    parent: john at arbash-meinel.com-20090319203157-h1b6rtdqm3wjjgli
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: brisbane-core
    timestamp: Thu 2009-03-19 22:16:52 -0500
    message:
      Merge the _LazyGroupContentManager, et al.
      
      This allows us to stream GroupCompressBlocks in their compressed form, and unpack them
      during insert, rather than during get().
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
      bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
        ------------------------------------------------------------
        revno: 3869.6.28
        revision-id: john at arbash-meinel.com-20090319203157-h1b6rtdqm3wjjgli
        parent: john at arbash-meinel.com-20090319030602-stjxub1g3yhq0u32
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: lazy_gc_stream
        timestamp: Thu 2009-03-19 15:31:57 -0500
        message:
          We can use 'random_id=True' when copying the streams.
          This is because the 'get_stream' code is responsible for ensuring
          the keys are truly non-overlapping, and we know we are creating a
          new pack file.
          
          It might mean that we have some overlap with yet-another existing
          pack file, but only if some other operation inserted it accidentally,
          and that doesn't hurt anything. When we autopack or fetch, we will
          skip one of those records anyway.
          
          This saves quite a bit of time, since we don't have to look up
          texts in the index we are writing. Mostly only in the case of
          large projects where we have spilled some of the nodes to disk
          already.
        modified:
          bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 3869.7.3
    revision-id: ian.clatworthy at canonical.com-20090320015656-xrypfxtcwk0poi4z
    parent: ian.clatworthy at canonical.com-20090319193106-4bwt29ovr1b710ky
    committer: Ian Clatworthy <ian.clatworthy at canonical.com>
    branch nick: brisbane-core
    timestamp: Fri 2009-03-20 11:56:56 +1000
    message:
      Inventory.iter_just_entries() API & test
    modified:
      bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/tests/inventory_implementations/basics.py basics.py-20070903044446-kdjwbiu1p1zi9phs-1
    ------------------------------------------------------------
    revno: 3869.7.2
    revision-id: ian.clatworthy at canonical.com-20090319193106-4bwt29ovr1b710ky
    parent: ian.clatworthy at canonical.com-20090318095149-y903o2ecqqcslikf
    committer: Ian Clatworthy <ian.clatworthy at canonical.com>
    branch nick: brisbane-core
    timestamp: Fri 2009-03-20 05:31:06 +1000
    message:
      fix chk_map Node %r formatting
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3869.7.1
    revision-id: ian.clatworthy at canonical.com-20090318095149-y903o2ecqqcslikf
    parent: john at arbash-meinel.com-20090317201340-amjnj1wl78iwcxae
    committer: Ian Clatworthy <ian.clatworthy at canonical.com>
    branch nick: brisbane-core
    timestamp: Wed 2009-03-18 19:51:49 +1000
    message:
      fix add's interaction with filtered views
    modified:
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
-------------- next part --------------
=== modified file 'bzrlib/builtins.py'

--- a/bzrlib/builtins.py	2009-03-17 20:13:40 +0000
+++ b/bzrlib/builtins.py	2009-03-18 09:51:49 +0000
@@ -83,9 +83,10 @@
         tree = WorkingTree.open_containing(file_list[0])[0]
         if tree.supports_views():
             view_files = tree.views.lookup_view()
-            for filename in file_list:
-                if not osutils.is_inside_any(view_files, filename):
-                    raise errors.FileOutsideView(filename, view_files)
+            if view_files:
+                for filename in file_list:
+                    if not osutils.is_inside_any(view_files, filename):
+                        raise errors.FileOutsideView(filename, view_files)
     else:
         tree = WorkingTree.open_containing(u'.')[0]
         if tree.supports_views():

=== modified file 'bzrlib/chk_map.py'
--- a/bzrlib/chk_map.py	2009-03-12 07:03:10 +0000
+++ b/bzrlib/chk_map.py	2009-03-19 19:31:06 +0000
@@ -523,7 +523,7 @@
     def __repr__(self):
         items_str = str(sorted(self._items))
         if len(items_str) > 20:
-            items_str = items_str[16] + '...]'
+            items_str = items_str[:16] + '...]'
         return '%s(key:%s len:%s size:%s max:%s prefix:%s items:%s)' % (
             self.__class__.__name__, self._key, self._len, self._raw_size,
             self._maximum_size, self._search_prefix, items_str)
@@ -607,9 +607,9 @@
             self._search_key_func = search_key_func
 
     def __repr__(self):
-        items_str = sorted(self._items)
+        items_str = str(sorted(self._items))
         if len(items_str) > 20:
-            items_str = items_str[16] + '...]'
+            items_str = items_str[:16] + '...]'
         return \
             '%s(key:%s len:%s size:%s max:%s prefix:%s keywidth:%s items:%s)' \
             % (self.__class__.__name__, self._key, self._len, self._raw_size,

=== modified file 'bzrlib/diff-delta.c'
--- a/bzrlib/diff-delta.c	2009-03-19 14:51:32 +0000
+++ b/bzrlib/diff-delta.c	2009-03-19 23:30:50 +0000
@@ -393,7 +393,7 @@
     for (i = 4; (1u << i) < hsize && i < 31; i++);
     hsize = 1 << i;
     hmask = hsize - 1;
-    if (old && old->hash_mask < hmask) {
+    if (old && old->hash_mask > hmask) {
         hmask = old->hash_mask;
         hsize = hmask + 1;
     }

=== modified file 'bzrlib/groupcompress.py'
--- a/bzrlib/groupcompress.py	2009-03-19 18:38:49 +0000
+++ b/bzrlib/groupcompress.py	2009-03-20 15:48:11 +0000
@@ -339,20 +339,11 @@
         :param sha1: TODO (should we validate only when sha1 is supplied?)
         :return: The bytes for the content
         """
-        # Make sure we have enough bytes for this record
-        # TODO: if we didn't want to track the end of this entry, we could
-        #       _ensure_content(start+enough_bytes_for_type_and_length), and
-        #       then decode the entry length, and
-        #       _ensure_content(start+1+length)
-        #       It is 2 calls to _ensure_content(), but we always buffer a bit
-        #       extra anyway, and it means 1 less offset stored in the index,
-        #       and transmitted over the wire
-        if end is None:
-            # it takes 5 bytes to encode 2^32, so we need 1 byte to hold the
-            # 'f' or 'd' declaration, and then 5 more for the record length.
-            self._ensure_content(start + 6)
-        else:
-            self._ensure_content(end)
+        # Handle the 'Empty Content' record, even if we don't always write it
+        # yet.
+        if start == end == 0:
+            return ''
+        self._ensure_content(end)
         # The bytes are 'f' or 'd' for the type, then a variable-length
         # base128 integer for the content size, then the actual content
         # We know that the variable-length integer won't be longer than 5
@@ -368,23 +359,15 @@
         content_len, len_len = decode_base128_int(
                             self._content[start + 1:start + 6])
         content_start = start + 1 + len_len
-        if end is None:
-            end = content_start + content_len
-            self._ensure_content(end)
-        else:
-            if end != content_start + content_len:
-                raise ValueError('end != len according to field header'
-                    ' %s != %s' % (end, content_start + content_len))
-        entry = GroupCompressBlockEntry(key, type, sha1=None,
-                                        start=start, length=end-start)
+        if end != content_start + content_len:
+            raise ValueError('end != len according to field header'
+                ' %s != %s' % (end, content_start + content_len))
         content = self._content[content_start:end]
         if c == 'f':
             bytes = content
         elif c == 'd':
             bytes = _groupcompress_pyx.apply_delta(self._content, content)
-        # if entry.sha1 is None:
-        #     entry.sha1 = sha_string(bytes)
-        return entry, bytes
+        return bytes
 
     def add_entry(self, key, type, sha1, start, length):
         """Add new meta info about an entry.
@@ -515,7 +498,7 @@
         if storage_kind in ('fulltext', 'chunked'):
             self._manager._prepare_for_extract()
             block = self._manager._block
-            _, bytes = block.extract(self.key, self._start, self._end)
+            bytes = block.extract(self.key, self._start, self._end)
             if storage_kind == 'fulltext':
                 return bytes
             else:

=== modified file 'bzrlib/inventory.py'
--- a/bzrlib/inventory.py	2009-03-17 20:13:40 +0000
+++ b/bzrlib/inventory.py	2009-03-20 01:56:56 +0000
@@ -1143,6 +1143,19 @@
         """Iterate over all file-ids."""
         return iter(self._byid)
 
+    def iter_just_entries(self):
+        """Iterate over all entries.
+        
+        Unlike iter_entries(), just the entries are returned (not (path, ie))
+        and the order of entries is undefined.
+
+        XXX: We may not want to merge this into bzr.dev.
+        """
+        if self.root is None:
+            return
+        for _, ie in self._byid.iteritems():
+            yield ie
+
     def __len__(self):
         """Returns number of entries."""
         return len(self._byid)
@@ -1722,6 +1735,22 @@
         for key, _ in self.id_to_entry.iteritems():
             yield key[-1]
 
+    def iter_just_entries(self):
+        """Iterate over all entries.
+        
+        Unlike iter_entries(), just the entries are returned (not (path, ie))
+        and the order of entries is undefined.
+
+        XXX: We may not want to merge this into bzr.dev.
+        """
+        for key, entry in self.id_to_entry.iteritems():
+            file_id = key[0]
+            ie = self._entry_cache.get(file_id, None)
+            if ie is None:
+                ie = self._bytes_to_entry(entry)
+                self._entry_cache[file_id] = ie
+            yield ie
+
     def iter_changes(self, basis):
         """Generate a Tree.iter_changes change list between this and basis.
 

=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- a/bzrlib/repofmt/groupcompress_repo.py	2009-03-19 19:47:20 +0000
+++ b/bzrlib/repofmt/groupcompress_repo.py	2009-03-20 15:48:11 +0000
@@ -254,9 +254,6 @@
                 next_keys = set()
                 def handle_internal_node(node):
                     for prefix, value in node._items.iteritems():
-                        if not isinstance(value, tuple):
-                            raise AssertionError("value is %s when a tuple"
-                                " is expected" % (value.__class__))
                         # We don't want to request the same key twice, and we
                         # want to order it by the first time it is seen.
                         # Even further, we don't want to request a key which is
@@ -290,13 +287,6 @@
                             handle_internal_node(node)
                         elif parse_leaf_nodes:
                             handle_leaf_node(node)
-                        # XXX: We don't walk the chk map to determine
-                        #      referenced (file_id, revision_id) keys.
-                        #      We don't do it yet because you really need to
-                        #      filter out the ones that are present in the
-                        #      parents of the rev just before the ones you are
-                        #      copying, otherwise the filter is grabbing too
-                        #      many keys...
                         counter[0] += 1
                         if pb is not None:
                             pb.update('chk node', counter[0], total_keys)

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py	2009-03-17 20:33:54 +0000
+++ b/bzrlib/repofmt/pack_repo.py	2009-03-20 03:16:52 +0000
@@ -2498,7 +2498,7 @@
             total = len(revision_ids)
             for pos, inv in enumerate(self.iter_inventories(revision_ids)):
                 pb.update("Finding text references", pos, total)
-                for _, entry in inv.iter_entries():
+                for entry in inv.iter_just_entries():
                     if entry.revision != inv.revision_id:
                         continue
                     if not rich_roots and entry.file_id == inv.root_id:

=== modified file 'bzrlib/tests/inventory_implementations/basics.py'
--- a/bzrlib/tests/inventory_implementations/basics.py	2009-03-12 08:12:18 +0000
+++ b/bzrlib/tests/inventory_implementations/basics.py	2009-03-20 01:56:56 +0000
@@ -208,6 +208,23 @@
             ('src/hello.c', 'hello-id'),
             ], [(path, ie.file_id) for path, ie in inv.iter_entries()])
 
+    def test_iter_just_entries(self):
+        inv = self.make_inventory('tree-root')
+        for args in [('src', 'directory', 'src-id'),
+                     ('doc', 'directory', 'doc-id'),
+                     ('src/hello.c', 'file', 'hello-id'),
+                     ('src/bye.c', 'file', 'bye-id'),
+                     ('Makefile', 'file', 'makefile-id')]:
+            inv.add_path(*args)
+        self.assertEqual([
+            'bye-id',
+            'doc-id',
+            'hello-id',
+            'makefile-id',
+            'src-id',
+            'tree-root',
+            ], sorted([ie.file_id for ie in inv.iter_just_entries()]))
+
     def test_iter_entries_by_dir(self):
         inv = self.make_inventory('tree-root')
         for args in [('src', 'directory', 'src-id'),

=== modified file 'bzrlib/tests/test_groupcompress.py'
--- a/bzrlib/tests/test_groupcompress.py	2009-03-19 03:06:02 +0000
+++ b/bzrlib/tests/test_groupcompress.py	2009-03-20 15:43:10 +0000
@@ -329,21 +329,6 @@
                              'length:100\n'
                              '\n', raw_bytes)
 
-    def test_extract_no_end(self):
-        # We should be able to extract a record, even if we only know the start
-        # of the bytes.
-        texts = {
-            ('key1',): 'text for key1\nhas bytes that are common\n',
-            ('key2',): 'text for key2\nhas bytes that are common\n',
-        }
-        entries, block = self.make_block(texts)
-        self.assertEqualDiff('text for key1\nhas bytes that are common\n',
-                             block.extract(('key1',), entries[('key1',)].start,
-                                           end=None)[1])
-        self.assertEqualDiff('text for key2\nhas bytes that are common\n',
-                             block.extract(('key2',), entries[('key2',)].start,
-                                           end=None)[1])
-
     def test_partial_decomp(self):
         content_chunks = []
         # We need a sufficient amount of data so that zlib.decompress has