Rev 4470: (robertc) Pack 2a repositories after fetching from a different format in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Jun 23 01:35:23 BST 2009


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 4470
revision-id: pqm at pqm.ubuntu.com-20090623003517-lrjel82rf7q6qjlc
parent: pqm at pqm.ubuntu.com-20090622171120-fuxez9ylfqpxynqn
parent: robertc at robertcollins.net-20090622232400-3v66jsa4bdorxcn6
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2009-06-23 01:35:17 +0100
message:
  (robertc) Pack 2a repositories after fetching from a different format
  	(bug 376748) and fix problems with autopacking 2a repositories
  	(bug 365615). (Robert Collins)
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
  bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
  bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
  bzrlib/tests/per_repository/test_pack.py test_pack.py-20070712120702-0c7585lh56p894mo-2
  bzrlib/tests/per_repository/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
  bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
  bzrlib/tests/test_pack_repository.py test_pack_repository-20080801043947-eaw0e6h2gu75kwmy-1
  bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
    ------------------------------------------------------------
    revno: 4462.2.10
    revision-id: robertc at robertcollins.net-20090622232400-3v66jsa4bdorxcn6
    parent: robertc at robertcollins.net-20090622215537-f7kxi0tui92ysiec
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Tue 2009-06-23 09:24:00 +1000
    message:
      Add explicit test for autopack of CHK repositories when CHK pages are not in the source packs.
    modified:
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
    ------------------------------------------------------------
    revno: 4462.2.9
    revision-id: robertc at robertcollins.net-20090622215537-f7kxi0tui92ysiec
    parent: robertc at robertcollins.net-20090622061541-mri46zc9w30imk3l
    parent: pqm at pqm.ubuntu.com-20090622171120-fuxez9ylfqpxynqn
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Tue 2009-06-23 07:55:37 +1000
    message:
      Resolve NEWS.
    renamed:
      generate_docs.py => tools/generate_docs.py bzrinfogen.py-20051211224525-78e7c14f2c955e55
      tools/doc_generate => bzrlib/doc_generate bzrinfogen-20051211214907-45ff5f0af3a80b32
    modified:
      Makefile                       Makefile-20050805140406-d96e3498bb61c5bb
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/_known_graph_py.py      _known_graph_py.py-20090610185421-vw8vfda2cgnckgb1-1
      bzrlib/_known_graph_pyx.pyx    _known_graph_pyx.pyx-20090610194911-yjk73td9hpjilas0-1
      bzrlib/bugtracker.py           bugtracker.py-20070410073305-vu1vu1qosjurg8kb-1
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/bzrdir.py               bzrdir.py-20060131065624-156dfea39c4387cb
      bzrlib/commands.py             bzr.py-20050309040720-d10f4714595cf8c3
      bzrlib/doc_generate/__init__.py __init__.py-20051211214907-df9e0e6b493553f1
      bzrlib/doc_generate/autodoc_bash_completion.py big_bash_completion.py-20051211223059-00ecfbfcc8056b78
      bzrlib/doc_generate/autodoc_man.py bzrman.py-20050601153041-0ff7f74de456d15e
      bzrlib/doc_generate/autodoc_rstx.py autodoc_rstx.py-20060420024836-3e0d4a526452193c
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/help.py                 help.py-20050505025907-4dd7a6d63912f894
      bzrlib/help_topics/__init__.py help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
      bzrlib/hooks.py                hooks.py-20070325015548-ix4np2q0kd8452au-1
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/pack.py                 container.py-20070607160755-tr8zc26q18rn0jnb-1
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/revision.py             revision.py-20050309040759-e77802c08f3999d5
      bzrlib/tests/blackbox/test_push.py test_push.py-20060329002750-929af230d5d22663
      bzrlib/tests/test__known_graph.py test__known_graph.py-20090610185421-vw8vfda2cgnckgb1-2
      bzrlib/tests/test_generate_docs.py test_generate_docs.p-20070102123151-cqctnsrlqwmiljd7-1
      bzrlib/tests/test_pack.py      test_container.py-20070607160755-tr8zc26q18rn0jnb-2
      bzrlib/tests/test_tuned_gzip.py test_tuned_gzip.py-20060418042056-c576dfc708984968
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
      bzrlib/tuned_gzip.py           tuned_gzip.py-20060407014720-5aadc518e928e8d2
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
      setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
      tools/time_graph.py            time_graph.py-20090608210127-6g0epojxnqjo0f0s-1
      tools/generate_docs.py         bzrinfogen.py-20051211224525-78e7c14f2c955e55
    ------------------------------------------------------------
    revno: 4462.2.8
    revision-id: robertc at robertcollins.net-20090622061541-mri46zc9w30imk3l
    parent: robertc at robertcollins.net-20090622061438-3v9hl1pe2ph72ik4
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Mon 2009-06-22 16:15:41 +1000
    message:
      Review corrections.
    modified:
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
    ------------------------------------------------------------
    revno: 4462.2.7
    revision-id: robertc at robertcollins.net-20090622061438-3v9hl1pe2ph72ik4
    parent: robertc at robertcollins.net-20090622052704-32rm1mbm9mgfk1v3
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Mon 2009-06-22 16:14:38 +1000
    message:
      Both StreamSink and InterDifferingSerialiser now pack after fetching when it is beneficial
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
    ------------------------------------------------------------
    revno: 4462.2.6
    revision-id: robertc at robertcollins.net-20090622052704-32rm1mbm9mgfk1v3
    parent: robertc at robertcollins.net-20090622045621-plce53iif067uod1
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Mon 2009-06-22 15:27:04 +1000
    message:
      Cause StreamSink to partially pack repositories after cross format fetches when beneficial.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
    ------------------------------------------------------------
    revno: 4462.2.5
    revision-id: robertc at robertcollins.net-20090622045621-plce53iif067uod1
    parent: robertc at robertcollins.net-20090622022509-qn2rjozy7g1hsmpv
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Mon 2009-06-22 14:56:21 +1000
    message:
      Teach groupcompress repositories to honour pack hints, and also not error when a CHK page is not in the packs being repacked by partial pack operations.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
    ------------------------------------------------------------
    revno: 4462.2.4
    revision-id: robertc at robertcollins.net-20090622022509-qn2rjozy7g1hsmpv
    parent: robertc at robertcollins.net-20090621235117-zvjywxin20usblpn
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Mon 2009-06-22 12:25:09 +1000
    message:
      Teach commit_write_group to return hint information for pack().
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
      bzrlib/tests/test_pack_repository.py test_pack_repository-20080801043947-eaw0e6h2gu75kwmy-1
    ------------------------------------------------------------
    revno: 4462.2.3
    revision-id: robertc at robertcollins.net-20090621235117-zvjywxin20usblpn
    parent: robertc at robertcollins.net-20090619042602-dicz171b8vhj1s71
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Mon 2009-06-22 09:51:17 +1000
    message:
      Add a hint parameter to Repository.pack.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/per_repository/test_pack.py test_pack.py-20070712120702-0c7585lh56p894mo-2
    ------------------------------------------------------------
    revno: 4462.2.2
    revision-id: robertc at robertcollins.net-20090619042602-dicz171b8vhj1s71
    parent: robertc at robertcollins.net-20090619041922-acr6p23jah4z2gc8
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Fri 2009-06-19 14:26:02 +1000
    message:
      Change CHK already-packed check to be generic using the pack_compresses flag.
    modified:
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 4462.2.1
    revision-id: robertc at robertcollins.net-20090619041922-acr6p23jah4z2gc8
    parent: pqm at pqm.ubuntu.com-20090618213920-8d1p9f28uomzfkvl
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: autopack-cross-format-fetch
    timestamp: Fri 2009-06-19 14:19:22 +1000
    message:
      Add new attribute to RepositoryFormat pack_compresses, hinting when pack can be useful.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/per_repository/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
=== modified file 'NEWS'
--- a/NEWS	2009-06-22 17:11:20 +0000
+++ b/NEWS	2009-06-22 21:55:37 +0000
@@ -47,6 +47,10 @@
   ``--2a`` formats should be down to exactly 2x the size. Related to bug
   #109114. (John Arbash Meinel)
 
+* Repositories using CHK pages (which includes the new 2a format) will no
+  longer error during commit or push operations when an autopack operation
+  is triggered. (Robert Collins, #365615)
+
 * Unshelve works correctly when multiple zero-length files are present on
   the shelf. (Aaron Bentley, #363444)
 
@@ -66,16 +70,37 @@
   for files with long ancestry and 'cherrypicked' changes.)
   (John Arbash Meinel, Vincent Ladeuil)
 
+* ``GroupCompress`` repositories now take advantage of the pack hints
+  parameter to permit cross-format fetching to incrementally pack the
+  converted data. (Robert Collins)
+
 * pack <=> pack fetching is now done via a ``PackStreamSource`` rather
   than the ``Packer`` code. The user visible change is that we now
   properly fetch the minimum number of texts for non-smart fetching.
   (John Arbash Meinel)
 
+* ``Repository.commit_write_group`` now returns opaque data about what
+  was committed, for passing to the ``Repository.pack``. Repositories
+  without atomic commits will still return None. (Robert Collins)
+
+* ``Repository.pack`` now takes an optional ``hint`` parameter
+  which will support doing partial packs for repositories that can do
+  that. (Robert Collins)
+
+* RepositoryFormat has a new attribute 'pack_compresses' which is True
+  when doing a pack operation changes the compression of content in the
+  repository. (Robert Collins)
+
+* ``StreamSink`` and ``InterDifferingSerialiser`` will call
+  ``Repository.pack`` with the hint returned by
+  ``Repository.commit_write_group`` if the formats were different and the
+  repository can increase compression by doing a pack operation.
+  (Robert Collins, #376748)
+
 * ``VersionedFiles._add_text`` is a new api that lets us insert text into
   the repository as a single string, rather than a list of lines. This can
   improve memory overhead and performance of committing large files.
   (Currently a private api, used only by commit). (John Arbash Meinel)
-  
 
 
 Improvements

=== modified file 'bzrlib/remote.py'
--- a/bzrlib/remote.py	2009-06-17 03:53:51 +0000
+++ b/bzrlib/remote.py	2009-06-21 23:51:17 +0000
@@ -566,6 +566,11 @@
         return self._creating_repo._real_repository._format.network_name()
 
     @property
+    def pack_compresses(self):
+        self._ensure_real()
+        return self._custom_format.pack_compresses
+
+    @property
     def _serializer(self):
         self._ensure_real()
         return self._custom_format._serializer
@@ -1491,13 +1496,13 @@
         return self._real_repository.inventories
 
     @needs_write_lock
-    def pack(self):
+    def pack(self, hint=None):
         """Compress the data within the repository.
 
         This is not currently implemented within the smart server.
         """
         self._ensure_real()
-        return self._real_repository.pack()
+        return self._real_repository.pack(hint=hint)
 
     @property
     def revisions(self):

=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- a/bzrlib/repofmt/groupcompress_repo.py	2009-06-22 15:13:45 +0000
+++ b/bzrlib/repofmt/groupcompress_repo.py	2009-06-22 21:55:37 +0000
@@ -218,6 +218,7 @@
             p_id_roots_set = set()
             stream = source_vf.get_record_stream(keys, 'groupcompress', True)
             for idx, record in enumerate(stream):
+                # Inventories should always be with revisions; assume success.
                 bytes = record.get_bytes_as('fulltext')
                 chk_inv = inventory.CHKInventory.deserialise(None, bytes,
                                                              record.key)
@@ -294,6 +295,11 @@
                     stream = source_vf.get_record_stream(cur_keys,
                                                          'as-requested', True)
                     for record in stream:
+                        if record.storage_kind == 'absent':
+                            # An absent CHK record: we assume that the missing
+                            # record is in a different pack - e.g. a page not
+                            # altered by the commit we're packing.
+                            continue
                         bytes = record.get_bytes_as('fulltext')
                         # We don't care about search_key_func for this code,
                         # because we only care about external references.
@@ -558,11 +564,6 @@
     pack_factory = GCPack
     resumed_pack_factory = ResumedGCPack
 
-    def _already_packed(self):
-        """Is the collection already packed?"""
-        # Always repack GC repositories for now
-        return False
-
     def _execute_pack_operations(self, pack_operations,
                                  _packer_class=GCCHKPacker,
                                  reload_func=None):
@@ -1048,6 +1049,7 @@
     _fetch_order = 'unordered'
     _fetch_uses_deltas = False # essentially ignored by the groupcompress code.
     fast_deltas = True
+    pack_compresses = True
 
     def _get_matching_bzrdir(self):
         return bzrdir.format_registry.make_bzrdir('development6-rich-root')

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py	2009-06-17 17:57:15 +0000
+++ b/bzrlib/repofmt/pack_repo.py	2009-06-22 04:56:21 +0000
@@ -1459,12 +1459,12 @@
         in synchronisation with certain steps. Otherwise the names collection
         is not flushed.
 
-        :return: True if packing took place.
+        :return: Something evaluating true if packing took place.
         """
         while True:
             try:
                 return self._do_autopack()
-            except errors.RetryAutopack, e:
+            except errors.RetryAutopack:
                 # If we get a RetryAutopack exception, we should abort the
                 # current action, and retry.
                 pass
@@ -1474,7 +1474,7 @@
         total_revisions = self.revision_index.combined_index.key_count()
         total_packs = len(self._names)
         if self._max_pack_count(total_revisions) >= total_packs:
-            return False
+            return None
         # determine which packs need changing
         pack_distribution = self.pack_distribution(total_revisions)
         existing_packs = []
@@ -1502,10 +1502,10 @@
             'containing %d revisions. Packing %d files into %d affecting %d'
             ' revisions', self, total_packs, total_revisions, num_old_packs,
             num_new_packs, num_revs_affected)
-        self._execute_pack_operations(pack_operations,
+        result = self._execute_pack_operations(pack_operations,
                                       reload_func=self._restart_autopack)
         mutter('Auto-packing repository %s completed', self)
-        return True
+        return result
 
     def _execute_pack_operations(self, pack_operations, _packer_class=Packer,
                                  reload_func=None):
@@ -1513,7 +1513,7 @@
 
         :param pack_operations: A list of [revision_count, packs_to_combine].
         :param _packer_class: The class of packer to use (default: Packer).
-        :return: None.
+        :return: The new pack names.
         """
         for revision_count, packs in pack_operations:
             # we may have no-ops from the setup logic
@@ -1535,10 +1535,11 @@
                 self._remove_pack_from_memory(pack)
         # record the newly available packs and stop advertising the old
         # packs
-        self._save_pack_names(clear_obsolete_packs=True)
+        result = self._save_pack_names(clear_obsolete_packs=True)
         # Move the old packs out of the way now they are no longer referenced.
         for revision_count, packs in pack_operations:
             self._obsolete_packs(packs)
+        return result
 
     def _flush_new_pack(self):
         if self._new_pack is not None:
@@ -1554,29 +1555,26 @@
 
     def _already_packed(self):
         """Is the collection already packed?"""
-        return len(self._names) < 2
+        return not (self.repo._format.pack_compresses or (len(self._names) > 1))
 
-    def pack(self):
+    def pack(self, hint=None):
         """Pack the pack collection totally."""
         self.ensure_loaded()
         total_packs = len(self._names)
         if self._already_packed():
-            # This is arguably wrong because we might not be optimal, but for
-            # now lets leave it in. (e.g. reconcile -> one pack. But not
-            # optimal.
             return
         total_revisions = self.revision_index.combined_index.key_count()
         # XXX: the following may want to be a class, to pack with a given
         # policy.
         mutter('Packing repository %s, which has %d pack files, '
-            'containing %d revisions into 1 packs.', self, total_packs,
-            total_revisions)
+            'containing %d revisions with hint %r.', self, total_packs,
+            total_revisions, hint)
         # determine which packs need changing
-        pack_distribution = [1]
         pack_operations = [[0, []]]
         for pack in self.all_packs():
-            pack_operations[-1][0] += pack.get_revision_count()
-            pack_operations[-1][1].append(pack)
+            if not hint or pack.name in hint:
+                pack_operations[-1][0] += pack.get_revision_count()
+                pack_operations[-1][1].append(pack)
         self._execute_pack_operations(pack_operations, OptimisingPacker)
 
     def plan_autopack_combinations(self, existing_packs, pack_distribution):
@@ -1938,6 +1936,7 @@
 
         :param clear_obsolete_packs: If True, clear out the contents of the
             obsolete_packs directory.
+        :return: A list of the names saved that were not previously on disk.
         """
         self.lock_names()
         try:
@@ -1958,6 +1957,7 @@
             self._unlock_names()
         # synchronise the memory packs list with what we just wrote:
         self._syncronize_pack_names_from_disk_nodes(disk_nodes)
+        return [new_node[0][0] for new_node in new_nodes]
 
     def reload_pack_names(self):
         """Sync our pack listing with what is present in the repository.
@@ -2097,7 +2097,7 @@
             if not self.autopack():
                 # when autopack takes no steps, the names list is still
                 # unsaved.
-                self._save_pack_names()
+                return self._save_pack_names()
 
     def _suspend_write_group(self):
         tokens = [pack.name for pack in self._resumed_packs]
@@ -2348,13 +2348,13 @@
         raise NotImplementedError(self.dont_leave_lock_in_place)
 
     @needs_write_lock
-    def pack(self):
+    def pack(self, hint=None):
         """Compress the data within the repository.
 
         This will pack all the data to a single pack. In future it may
         recompress deltas or do other such expensive operations.
         """
-        self._pack_collection.pack()
+        self._pack_collection.pack(hint=hint)
 
     @needs_write_lock
     def reconcile(self, other=None, thorough=False):

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2009-06-22 15:47:25 +0000
+++ b/bzrlib/repository.py	2009-06-22 21:55:37 +0000
@@ -1404,8 +1404,9 @@
             raise errors.BzrError('mismatched lock context %r and '
                 'write group %r.' %
                 (self.get_transaction(), self._write_group))
-        self._commit_write_group()
+        result = self._commit_write_group()
         self._write_group = None
+        return result
 
     def _commit_write_group(self):
         """Template method for per-repository write group cleanup.
@@ -2418,7 +2419,7 @@
             keys = tsort.topo_sort(parent_map)
         return [None] + list(keys)
 
-    def pack(self):
+    def pack(self, hint=None):
         """Compress the data within the repository.
 
         This operation only makes sense for some repository types. For other
@@ -2427,6 +2428,13 @@
         This stub method does not require a lock, but subclasses should use
         @needs_write_lock as this is a long running call its reasonable to
         implicitly lock for the user.
+
+        :param hint: If not supplied, the whole repository is packed.
+            If supplied, the repository may use the hint parameter as a
+            hint for the parts of the repository to pack. A hint can be
+            obtained from the result of commit_write_group(). Out of
+            date hints are simply ignored, because concurrent operations
+            can obsolete them rapidly.
         """
 
     def get_transaction(self):
@@ -2835,6 +2843,11 @@
     # Does this format have < O(tree_size) delta generation. Used to hint what
     # code path for commit, amongst other things.
     fast_deltas = None
+    # Does doing a pack operation compress data? Useful for the pack UI command
+    # (so if there is one pack, the operation can still proceed because it may
+    # help), and for fetching when data won't have come from the same
+    # compressor.
+    pack_compresses = False
 
     def __str__(self):
         return "<%s>" % self.__class__.__name__
@@ -3666,6 +3679,7 @@
         cache = lru_cache.LRUCache(100)
         cache[basis_id] = basis_tree
         del basis_tree # We don't want to hang on to it here
+        hints = []
         for offset in range(0, len(revision_ids), batch_size):
             self.target.start_write_group()
             try:
@@ -3677,7 +3691,11 @@
                 self.target.abort_write_group()
                 raise
             else:
-                self.target.commit_write_group()
+                hint = self.target.commit_write_group()
+                if hint:
+                    hints.extend(hint)
+        if hints and self.target._format.pack_compresses:
+            self.target.pack(hint=hints)
         pb.update('Transferring revisions', len(revision_ids),
                   len(revision_ids))
 
@@ -4025,7 +4043,10 @@
                 # missing keys can handle suspending a write group).
                 write_group_tokens = self.target_repo.suspend_write_group()
                 return write_group_tokens, missing_keys
-        self.target_repo.commit_write_group()
+        hint = self.target_repo.commit_write_group()
+        if (to_serializer != src_serializer and
+            self.target_repo._format.pack_compresses):
+            self.target_repo.pack(hint=hint)
         return [], set()
 
     def _extract_and_insert_inventories(self, substream, serializer):

=== modified file 'bzrlib/tests/per_repository/test_pack.py'
--- a/bzrlib/tests/per_repository/test_pack.py	2009-03-23 14:59:43 +0000
+++ b/bzrlib/tests/per_repository/test_pack.py	2009-06-21 23:51:17 +0000
@@ -24,3 +24,14 @@
     def test_pack_empty_does_not_error(self):
         repo = self.make_repository('.')
         repo.pack()
+
+    def test_pack_accepts_opaque_hint(self):
+        # For requesting packs of a repository where some data is known to be
+        # unoptimal we permit packing just some data via a hint. If the hint is
+        # illegible it is ignored.
+        tree = self.make_branch_and_tree('tree')
+        rev1 = tree.commit('1')
+        rev2 = tree.commit('2')
+        rev3 = tree.commit('3')
+        rev4 = tree.commit('4')
+        tree.branch.repository.pack(hint=[rev3, rev4])

=== modified file 'bzrlib/tests/per_repository/test_repository.py'
--- a/bzrlib/tests/per_repository/test_repository.py	2009-06-17 21:33:03 +0000
+++ b/bzrlib/tests/per_repository/test_repository.py	2009-06-19 04:19:22 +0000
@@ -66,29 +66,29 @@
 
 class TestRepository(TestCaseWithRepository):
 
+    def assertFormatAttribute(self, attribute, allowed_values):
+        """Assert that the format has an attribute 'attribute'."""
+        repo = self.make_repository('repo')
+        self.assertSubset([getattr(repo._format, attribute)], allowed_values)
+
     def test_attribute__fetch_order(self):
         """Test the the _fetch_order attribute."""
-        tree = self.make_branch_and_tree('tree')
-        repo = tree.branch.repository
-        self.assertTrue(repo._format._fetch_order in ('topological', 'unordered'))
+        self.assertFormatAttribute('_fetch_order', ('topological', 'unordered'))
 
     def test_attribute__fetch_uses_deltas(self):
         """Test the the _fetch_uses_deltas attribute."""
-        tree = self.make_branch_and_tree('tree')
-        repo = tree.branch.repository
-        self.assertTrue(repo._format._fetch_uses_deltas in (True, False))
+        self.assertFormatAttribute('_fetch_uses_deltas', (True, False))
 
     def test_attribute_fast_deltas(self):
         """Test the format.fast_deltas attribute."""
-        tree = self.make_branch_and_tree('tree')
-        repo = tree.branch.repository
-        self.assertTrue(repo._format.fast_deltas in (True, False))
+        self.assertFormatAttribute('fast_deltas', (True, False))
 
     def test_attribute__fetch_reconcile(self):
         """Test the the _fetch_reconcile attribute."""
-        tree = self.make_branch_and_tree('tree')
-        repo = tree.branch.repository
-        self.assertTrue(repo._format._fetch_reconcile in (True, False))
+        self.assertFormatAttribute('_fetch_reconcile', (True, False))
+
+    def test_attribute_format_pack_compresses(self):
+        self.assertFormatAttribute('pack_compresses', (True, False))
 
     def test_attribute_inventories_store(self):
         """Test the existence of the inventories attribute."""

=== modified file 'bzrlib/tests/per_repository/test_write_group.py'
--- a/bzrlib/tests/per_repository/test_write_group.py	2009-06-10 03:56:49 +0000
+++ b/bzrlib/tests/per_repository/test_write_group.py	2009-06-22 02:25:09 +0000
@@ -68,11 +68,14 @@
             repo.commit_write_group()
             repo.unlock()
 
-    def test_commit_write_group_gets_None(self):
+    def test_commit_write_group_does_not_error(self):
         repo = self.make_repository('.')
         repo.lock_write()
         repo.start_write_group()
-        self.assertEqual(None, repo.commit_write_group())
+        # commit_write_group can either return None (for repositories without
+        # isolated transactions) or a hint for pack(). So we only check it
+        # works in this interface test, because all repositories are exercised.
+        repo.commit_write_group()
         repo.unlock()
 
     def test_unlock_in_write_group(self):

=== modified file 'bzrlib/tests/test_pack_repository.py'
--- a/bzrlib/tests/test_pack_repository.py	2009-06-17 17:57:15 +0000
+++ b/bzrlib/tests/test_pack_repository.py	2009-06-22 02:25:09 +0000
@@ -238,6 +238,35 @@
         pack_names = [node[1][0] for node in index.iter_all_entries()]
         self.assertTrue(large_pack_name in pack_names)
 
+    def test_commit_write_group_returns_new_pack_names(self):
+        format = self.get_format()
+        tree = self.make_branch_and_tree('foo', format=format)
+        tree.commit('first post')
+        repo = tree.branch.repository
+        repo.lock_write()
+        try:
+            repo.start_write_group()
+            try:
+                inv = inventory.Inventory(revision_id="A")
+                inv.root.revision = "A"
+                repo.texts.add_lines((inv.root.file_id, "A"), [], [])
+                rev = _mod_revision.Revision(timestamp=0, timezone=None,
+                    committer="Foo Bar <foo at example.com>", message="Message",
+                    revision_id="A")
+                rev.parent_ids = ()
+                repo.add_revision("A", rev, inv=inv)
+            except:
+                repo.abort_write_group()
+                raise
+            else:
+                old_names = repo._pack_collection._names.keys()
+                result = repo.commit_write_group()
+                cur_names = repo._pack_collection._names.keys()
+                new_names = list(set(cur_names) - set(old_names))
+                self.assertEqual(new_names, result)
+        finally:
+            repo.unlock()
+
     def test_fail_obsolete_deletion(self):
         # failing to delete obsolete packs is not fatal
         format = self.get_format()

=== modified file 'bzrlib/tests/test_repository.py'
--- a/bzrlib/tests/test_repository.py	2009-06-18 18:00:01 +0000
+++ b/bzrlib/tests/test_repository.py	2009-06-22 23:24:00 +0000
@@ -673,10 +673,14 @@
         self.assertFalse(repo._format.supports_external_lookups)
 
 
-class TestDevelopment6(TestCaseWithTransport):
+class Test2a(TestCaseWithTransport):
+
+    def test_format_pack_compresses_True(self):
+        repo = self.make_repository('repo', format='2a')
+        self.assertTrue(repo._format.pack_compresses)
 
     def test_inventories_use_chk_map_with_parent_base_dict(self):
-        tree = self.make_branch_and_tree('repo', format="development6-rich-root")
+        tree = self.make_branch_and_tree('repo', format="2a")
         revid = tree.commit("foo")
         tree.lock_read()
         self.addCleanup(tree.unlock)
@@ -688,14 +692,41 @@
         self.assertEqual(65536,
             inv.parent_id_basename_to_file_id._root_node.maximum_size)
 
+    def test_autopack_unchanged_chk_nodes(self):
+        # at 20 unchanged commits, chk pages are packed that are split into
+        # two groups such that the new pack being made doesn't have all its
+        # pages in the source packs (though they are in the repository).
+        tree = self.make_branch_and_tree('tree', format='2a')
+        for pos in range(20):
+            tree.commit(str(pos))
+
+    def test_pack_with_hint(self):
+        tree = self.make_branch_and_tree('tree', format='2a')
+        # 1 commit to leave untouched
+        tree.commit('1')
+        to_keep = tree.branch.repository._pack_collection.names()
+        # 2 to combine
+        tree.commit('2')
+        tree.commit('3')
+        all = tree.branch.repository._pack_collection.names()
+        combine = list(set(all) - set(to_keep))
+        self.assertLength(3, all)
+        self.assertLength(2, combine)
+        tree.branch.repository.pack(hint=combine)
+        final = tree.branch.repository._pack_collection.names()
+        self.assertLength(2, final)
+        self.assertFalse(combine[0] in final)
+        self.assertFalse(combine[1] in final)
+        self.assertSubset(to_keep, final)
+
     def test_stream_source_to_gc(self):
-        source = self.make_repository('source', format='development6-rich-root')
-        target = self.make_repository('target', format='development6-rich-root')
+        source = self.make_repository('source', format='2a')
+        target = self.make_repository('target', format='2a')
         stream = source._get_source(target._format)
         self.assertIsInstance(stream, groupcompress_repo.GroupCHKStreamSource)
 
     def test_stream_source_to_non_gc(self):
-        source = self.make_repository('source', format='development6-rich-root')
+        source = self.make_repository('source', format='2a')
         target = self.make_repository('target', format='rich-root-pack')
         stream = source._get_source(target._format)
         # We don't want the child GroupCHKStreamSource
@@ -703,7 +734,7 @@
 
     def test_get_stream_for_missing_keys_includes_all_chk_refs(self):
         source_builder = self.make_branch_builder('source',
-                            format='development6-rich-root')
+                            format='2a')
         # We have to build a fairly large tree, so that we are sure the chk
         # pages will have split into multiple pages.
         entries = [('add', ('', 'a-root-id', 'directory', None))]
@@ -726,7 +757,7 @@
         source_branch = source_builder.get_branch()
         source_branch.lock_read()
         self.addCleanup(source_branch.unlock)
-        target = self.make_repository('target', format='development6-rich-root')
+        target = self.make_repository('target', format='2a')
         source = source_branch.repository._get_source(target._format)
         self.assertIsInstance(source, groupcompress_repo.GroupCHKStreamSource)
 
@@ -1354,3 +1385,83 @@
         self.assertTrue(new_pack.inventory_index._optimize_for_size)
         self.assertTrue(new_pack.text_index._optimize_for_size)
         self.assertTrue(new_pack.signature_index._optimize_for_size)
+
+
+class TestCrossFormatPacks(TestCaseWithTransport):
+
+    def log_pack(self, hint=None):
+        self.calls.append(('pack', hint))
+        self.orig_pack(hint=hint)
+        if self.expect_hint:
+            self.assertTrue(hint)
+
+    def run_stream(self, src_fmt, target_fmt, expect_pack_called):
+        self.expect_hint = expect_pack_called
+        self.calls = []
+        source_tree = self.make_branch_and_tree('src', format=src_fmt)
+        source_tree.lock_write()
+        self.addCleanup(source_tree.unlock)
+        tip = source_tree.commit('foo')
+        target = self.make_repository('target', format=target_fmt)
+        target.lock_write()
+        self.addCleanup(target.unlock)
+        source = source_tree.branch.repository._get_source(target._format)
+        self.orig_pack = target.pack
+        target.pack = self.log_pack
+        search = target.search_missing_revision_ids(
+            source_tree.branch.repository, tip)
+        stream = source.get_stream(search)
+        from_format = source_tree.branch.repository._format
+        sink = target._get_sink()
+        sink.insert_stream(stream, from_format, [])
+        if expect_pack_called:
+            self.assertLength(1, self.calls)
+        else:
+            self.assertLength(0, self.calls)
+
+    def run_fetch(self, src_fmt, target_fmt, expect_pack_called):
+        self.expect_hint = expect_pack_called
+        self.calls = []
+        source_tree = self.make_branch_and_tree('src', format=src_fmt)
+        source_tree.lock_write()
+        self.addCleanup(source_tree.unlock)
+        tip = source_tree.commit('foo')
+        target = self.make_repository('target', format=target_fmt)
+        target.lock_write()
+        self.addCleanup(target.unlock)
+        source = source_tree.branch.repository
+        self.orig_pack = target.pack
+        target.pack = self.log_pack
+        target.fetch(source)
+        if expect_pack_called:
+            self.assertLength(1, self.calls)
+        else:
+            self.assertLength(0, self.calls)
+
+    def test_sink_format_hint_no(self):
+        # When the target format says packing makes no difference, pack is not
+        # called.
+        self.run_stream('1.9', 'rich-root-pack', False)
+
+    def test_sink_format_hint_yes(self):
+        # When the target format says packing makes a difference, pack is
+        # called.
+        self.run_stream('1.9', '2a', True)
+
+    def test_sink_format_same_no(self):
+        # When the formats are the same, pack is not called.
+        self.run_stream('2a', '2a', False)
+
+    def test_IDS_format_hint_no(self):
+        # When the target format says packing makes no difference, pack is not
+        # called.
+        self.run_fetch('1.9', 'rich-root-pack', False)
+
+    def test_IDS_format_hint_yes(self):
+        # When the target format says packing makes a difference, pack is
+        # called.
+        self.run_fetch('1.9', '2a', True)
+
+    def test_IDS_format_same_no(self):
+        # When the formats are the same, pack is not called.
+        self.run_fetch('2a', '2a', False)




More information about the bazaar-commits mailing list