Rev 2713: (Andrew Bennetts) Add Repository.item_keys_introduced_by, and some associated refactoring of bzrlib/fetch.py. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Thu Aug 16 06:50:17 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2713
revision-id: pqm at pqm.ubuntu.com-20070816055011-tnjdmdi948uyvz6a
parent: pqm at pqm.ubuntu.com-20070816044231-k9pvlics7hlhxuw5
parent: andrew.bennetts at canonical.com-20070814034057-8ri53nse7y9h9mjy
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2007-08-16 06:50:11 +0100
message:
  (Andrew Bennetts) Add Repository.item_keys_introduced_by, and some associated refactoring of bzrlib/fetch.py.
modified:
  bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
  bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
    ------------------------------------------------------------
    revno: 2668.2.9
    merged: andrew.bennetts at canonical.com-20070814034057-8ri53nse7y9h9mjy
    parent: andrew.bennetts at canonical.com-20070808062316-1baiqpdwfzvznju1
    parent: pqm at pqm.ubuntu.com-20070813221757-bianevqddds8ift5
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Tue 2007-08-14 13:40:57 +1000
    message:
      Merge from bzr.dev
    ------------------------------------------------------------
    revno: 2668.2.8
    merged: andrew.bennetts at canonical.com-20070808062316-1baiqpdwfzvznju1
    parent: andrew.bennetts at canonical.com-20070808012954-xcsz5ucoomy3f5c4
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Wed 2007-08-08 16:23:16 +1000
    message:
      Rename get_data_to_fetch_for_revision_ids as item_keys_introduced_by.
    ------------------------------------------------------------
    revno: 2668.2.7
    merged: andrew.bennetts at canonical.com-20070808012954-xcsz5ucoomy3f5c4
    parent: andrew.bennetts at canonical.com-20070808004014-ty754dinakzia4rf
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Wed 2007-08-08 11:29:54 +1000
    message:
      Use bzrlib.revision.is_null rather than comparing against NULL_REVISION.
    ------------------------------------------------------------
    revno: 2668.2.6
    merged: andrew.bennetts at canonical.com-20070808004014-ty754dinakzia4rf
    parent: andrew.bennetts at canonical.com-20070808003903-mizhhxhm5q0ofvt6
    parent: andrew.bennetts at canonical.com-20070807074727-5vmpk09r98lyef00
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Wed 2007-08-08 10:40:14 +1000
    message:
      Merge repository-equality.
    ------------------------------------------------------------
    revno: 2668.2.5
    merged: andrew.bennetts at canonical.com-20070808003903-mizhhxhm5q0ofvt6
    parent: andrew.bennetts at canonical.com-20070806053243-myehcu0fvyrxpkyb
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Wed 2007-08-08 10:39:03 +1000
    message:
      Rename get_data_about_revision_ids to get_data_to_fetch_for_revision_ids.
    ------------------------------------------------------------
    revno: 2668.2.4
    merged: andrew.bennetts at canonical.com-20070806053243-myehcu0fvyrxpkyb
    parent: andrew.bennetts at canonical.com-20070806051607-5d1g6hxd69dyd20u
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Mon 2007-08-06 15:32:43 +1000
    message:
      Get rid of RepoFetcher._same_repo by using Repository.__eq__.
    ------------------------------------------------------------
    revno: 2668.2.3
    merged: andrew.bennetts at canonical.com-20070806051607-5d1g6hxd69dyd20u
    parent: andrew.bennetts at canonical.com-20070806051558-r2hfumxcwgotrp6a
    parent: andrew.bennetts at canonical.com-20070806051155-t570bk2i3gcnebwr
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Mon 2007-08-06 15:16:07 +1000
    message:
      Merge repository-equality.
    ------------------------------------------------------------
    revno: 2668.2.2
    merged: andrew.bennetts at canonical.com-20070806051558-r2hfumxcwgotrp6a
    parent: andrew.bennetts at canonical.com-20070802080306-s81tmi8j90hie7qe
    parent: pqm at pqm.ubuntu.com-20070803043116-l7u1uypblmx1uxnr
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Mon 2007-08-06 15:15:58 +1000
    message:
      Merge bzr.dev.
    ------------------------------------------------------------
    revno: 2668.2.1
    merged: andrew.bennetts at canonical.com-20070802080306-s81tmi8j90hie7qe
    parent: pqm at pqm.ubuntu.com-20070802072205-gjk1eev6rlw7ght8
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: fetch-refactor
    timestamp: Thu 2007-08-02 18:03:06 +1000
    message:
      Split out fetch refactoring from repo-refactor, adding Repository.get_data_about_revision_ids.
=== modified file 'bzrlib/fetch.py'
--- a/bzrlib/fetch.py	2007-08-15 04:33:34 +0000
+++ b/bzrlib/fetch.py	2007-08-16 05:50:11 +0000
@@ -36,7 +36,7 @@
 from bzrlib.errors import (InstallFailed,
                            )
 from bzrlib.progress import ProgressPhase
-from bzrlib.revision import NULL_REVISION
+from bzrlib.revision import is_null, NULL_REVISION
 from bzrlib.symbol_versioning import (deprecated_function,
         deprecated_method,
         )
@@ -79,9 +79,10 @@
         # result variables.
         self.failed_revisions = []
         self.count_copied = 0
-        if to_repository.control_files._transport.base == from_repository.control_files._transport.base:
-            # check that last_revision is in 'from' and then return a no-operation.
-            if last_revision not in (None, NULL_REVISION):
+        if to_repository.has_same_location(from_repository):
+            # check that last_revision is in 'from' and then return a
+            # no-operation.
+            if last_revision is not None and not is_null(last_revision):
                 to_repository.get_revision(last_revision)
             return
         self.to_repository = to_repository
@@ -129,19 +130,63 @@
         try:
             pp.next_phase()
             revs = self._revids_to_fetch()
-            # something to do ?
-            if revs:
-                pp.next_phase()
-                self._fetch_weave_texts(revs)
-                pp.next_phase()
-                self._fetch_inventory_weave(revs)
-                pp.next_phase()
-                self._fetch_revision_texts(revs)
-                self.count_copied += len(revs)
+            self._fetch_everything_for_revisions(revs, pp)
         finally:
             self.pb.clear()
 
+    def _fetch_everything_for_revisions(self, revs, pp):
+        """Fetch all data for the given set of revisions."""
+        if revs is None:
+            return
+        # The first phase is "file".  We pass the progress bar for it directly
+        # into item_keys_introduced_by, which has more information about how
+        # that phase is progressing than we do.  Progress updates for the other
+        # phases are taken care of in this function.
+        # XXX: there should be a clear owner of the progress reporting.  Perhaps
+        # item_keys_introduced_by should have a richer API than it does at the
+        # moment, so that it can feed the progress information back to this
+        # function?
+        phase = 'file'
+        pb = bzrlib.ui.ui_factory.nested_progress_bar()
+        try:
+            data_to_fetch = self.from_repository.item_keys_introduced_by(revs, pb)
+            for knit_kind, file_id, revisions in data_to_fetch:
+                if knit_kind != phase:
+                    phase = knit_kind
+                    # Make a new progress bar for this phase
+                    pb.finished()
+                    pp.next_phase()
+                    pb = bzrlib.ui.ui_factory.nested_progress_bar()
+                if knit_kind == "file":
+                    self._fetch_weave_text(file_id, revisions)
+                elif knit_kind == "inventory":
+                    # XXX:
+                    # Once we've processed all the files, then we generate the root
+                    # texts (if necessary), then we process the inventory.  It's a
+                    # bit distasteful to have knit_kind == "inventory" mean this,
+                    # perhaps it should happen on the first non-"file" knit, in case
+                    # it's not always inventory?
+                    self._generate_root_texts(revs)
+                    self._fetch_inventory_weave(revs, pb)
+                elif knit_kind == "signatures":
+                    # Nothing to do here; this will be taken care of when
+                    # _fetch_revision_texts happens.
+                    pass
+                elif knit_kind == "revisions":
+                    self._fetch_revision_texts(revs, pb)
+                else:
+                    raise AssertionError("Unknown knit kind %r" % knit_kind)
+        finally:
+            if pb is not None:
+                pb.finished()
+        self.count_copied += len(revs)
+        
     def _revids_to_fetch(self):
+        """Determines the exact revisions needed from self.from_repository to
+        install self._last_revision in self.to_repository.
+
+        If no revisions need to be fetched, then this just returns None.
+        """
         mutter('fetch up to rev {%s}', self._last_revision)
         if self._last_revision is NULL_REVISION:
             # explicit limit of no revisions needed
@@ -156,65 +201,55 @@
         except errors.NoSuchRevision:
             raise InstallFailed([self._last_revision])
 
-    def _fetch_weave_texts(self, revs):
-        texts_pb = bzrlib.ui.ui_factory.nested_progress_bar()
-        try:
-            # fileids_altered_by_revision_ids requires reading the inventory
-            # weave, we will need to read the inventory weave again when
-            # all this is done, so enable caching for that specific weave
-            inv_w = self.from_repository.get_inventory_weave()
-            inv_w.enable_cache()
-            file_ids = self.from_repository.fileids_altered_by_revision_ids(revs)
-            count = 0
-            num_file_ids = len(file_ids)
-            for file_id, required_versions in file_ids.items():
-                texts_pb.update("fetch texts", count, num_file_ids)
-                count +=1
-                to_weave = self.to_weaves.get_weave_or_empty(file_id,
-                    self.to_repository.get_transaction())
-                from_weave = self.from_weaves.get_weave(file_id,
-                    self.from_repository.get_transaction())
-                # we fetch all the texts, because texts do
-                # not reference anything, and its cheap enough
-                to_weave.join(from_weave, version_ids=required_versions)
-                # we don't need *all* of this data anymore, but we dont know
-                # what we do. This cache clearing will result in a new read 
-                # of the knit data when we do the checkout, but probably we
-                # want to emit the needed data on the fly rather than at the
-                # end anyhow.
-                # the from weave should know not to cache data being joined,
-                # but its ok to ask it to clear.
-                from_weave.clear_cache()
-                to_weave.clear_cache()
-        finally:
-            texts_pb.finished()
-
-    def _fetch_inventory_weave(self, revs):
-        pb = bzrlib.ui.ui_factory.nested_progress_bar()
-        try:
-            pb.update("fetch inventory", 0, 2)
-            to_weave = self.to_control.get_weave('inventory',
-                    self.to_repository.get_transaction())
-    
-            child_pb = bzrlib.ui.ui_factory.nested_progress_bar()
-            try:
-                # just merge, this is optimisable and its means we don't
-                # copy unreferenced data such as not-needed inventories.
-                pb.update("fetch inventory", 1, 3)
-                from_weave = self.from_repository.get_inventory_weave()
-                pb.update("fetch inventory", 2, 3)
-                # we fetch only the referenced inventories because we do not
-                # know for unselected inventories whether all their required
-                # texts are present in the other repository - it could be
-                # corrupt.
-                to_weave.join(from_weave, pb=child_pb, msg='merge inventory',
-                              version_ids=revs)
-                from_weave.clear_cache()
-            finally:
-                child_pb.finished()
-        finally:
-            pb.finished()
-
+    def _fetch_weave_text(self, file_id, required_versions):
+        to_weave = self.to_weaves.get_weave_or_empty(file_id,
+            self.to_repository.get_transaction())
+        from_weave = self.from_weaves.get_weave(file_id,
+            self.from_repository.get_transaction())
+        # we fetch all the texts, because texts do
+        # not reference anything, and its cheap enough
+        to_weave.join(from_weave, version_ids=required_versions)
+        # we don't need *all* of this data anymore, but we dont know
+        # what we do. This cache clearing will result in a new read 
+        # of the knit data when we do the checkout, but probably we
+        # want to emit the needed data on the fly rather than at the
+        # end anyhow.
+        # the from weave should know not to cache data being joined,
+        # but its ok to ask it to clear.
+        from_weave.clear_cache()
+        to_weave.clear_cache()
+
+    def _fetch_inventory_weave(self, revs, pb):
+        pb.update("fetch inventory", 0, 2)
+        to_weave = self.to_control.get_weave('inventory',
+                self.to_repository.get_transaction())
+
+        child_pb = bzrlib.ui.ui_factory.nested_progress_bar()
+        try:
+            # just merge, this is optimisable and its means we don't
+            # copy unreferenced data such as not-needed inventories.
+            pb.update("fetch inventory", 1, 3)
+            from_weave = self.from_repository.get_inventory_weave()
+            pb.update("fetch inventory", 2, 3)
+            # we fetch only the referenced inventories because we do not
+            # know for unselected inventories whether all their required
+            # texts are present in the other repository - it could be
+            # corrupt.
+            to_weave.join(from_weave, pb=child_pb, msg='merge inventory',
+                          version_ids=revs)
+            from_weave.clear_cache()
+        finally:
+            child_pb.finished()
+
+    def _generate_root_texts(self, revs):
+        """This will be called by __fetch between fetching weave texts and
+        fetching the inventory weave.
+
+        Subclasses should override this if they need to generate root texts
+        after fetching weave texts.
+        """
+        pass
+        
 
 class GenericRepoFetcher(RepoFetcher):
     """This is a generic repo to repo fetcher.
@@ -223,37 +258,29 @@
     It triggers a reconciliation after fetching to ensure integrity.
     """
 
-    def _fetch_revision_texts(self, revs):
+    def _fetch_revision_texts(self, revs, pb):
         """Fetch revision object texts"""
-        rev_pb = bzrlib.ui.ui_factory.nested_progress_bar()
-        try:
-            to_txn = self.to_transaction = self.to_repository.get_transaction()
-            count = 0
-            total = len(revs)
-            to_store = self.to_repository._revision_store
-            for rev in revs:
-                pb = bzrlib.ui.ui_factory.nested_progress_bar()
-                try:
-                    pb.update('copying revisions', count, total)
-                    try:
-                        sig_text = self.from_repository.get_signature_text(rev)
-                        to_store.add_revision_signature_text(rev, sig_text, to_txn)
-                    except errors.NoSuchRevision:
-                        # not signed.
-                        pass
-                    to_store.add_revision(self.from_repository.get_revision(rev),
-                                          to_txn)
-                    count += 1
-                finally:
-                    pb.finished()
-            # fixup inventory if needed: 
-            # this is expensive because we have no inverse index to current ghosts.
-            # but on local disk its a few seconds and sftp push is already insane.
-            # so we just-do-it.
-            # FIXME: repository should inform if this is needed.
-            self.to_repository.reconcile()
-        finally:
-            rev_pb.finished()
+        to_txn = self.to_transaction = self.to_repository.get_transaction()
+        count = 0
+        total = len(revs)
+        to_store = self.to_repository._revision_store
+        for rev in revs:
+            pb.update('copying revisions', count, total)
+            try:
+                sig_text = self.from_repository.get_signature_text(rev)
+                to_store.add_revision_signature_text(rev, sig_text, to_txn)
+            except errors.NoSuchRevision:
+                # not signed.
+                pass
+            to_store.add_revision(self.from_repository.get_revision(rev),
+                                  to_txn)
+            count += 1
+        # fixup inventory if needed: 
+        # this is expensive because we have no inverse index to current ghosts.
+        # but on local disk its a few seconds and sftp push is already insane.
+        # so we just-do-it.
+        # FIXME: repository should inform if this is needed.
+        self.to_repository.reconcile()
     
 
 class KnitRepoFetcher(RepoFetcher):
@@ -264,7 +291,7 @@
     copy revision texts.
     """
 
-    def _fetch_revision_texts(self, revs):
+    def _fetch_revision_texts(self, revs, pb):
         # may need to be a InterRevisionStore call here.
         from_transaction = self.from_repository.get_transaction()
         to_transaction = self.to_repository.get_transaction()
@@ -354,12 +381,10 @@
         GenericRepoFetcher.__init__(self, to_repository, from_repository,
                                     last_revision, pb)
 
-    def _fetch_weave_texts(self, revs):
-        GenericRepoFetcher._fetch_weave_texts(self, revs)
-        # Now generate a weave for the tree root
+    def _generate_root_texts(self, revs):
         self.helper.generate_root_texts(revs)
 
-    def _fetch_inventory_weave(self, revs):
+    def _fetch_inventory_weave(self, revs, pb):
         self.helper.regenerate_inventory(revs)
  
 
@@ -372,10 +397,8 @@
         KnitRepoFetcher.__init__(self, to_repository, from_repository,
                                  last_revision, pb)
 
-    def _fetch_weave_texts(self, revs):
-        KnitRepoFetcher._fetch_weave_texts(self, revs)
-        # Now generate a weave for the tree root
+    def _generate_root_texts(self, revs):
         self.helper.generate_root_texts(revs)
 
-    def _fetch_inventory_weave(self, revs):
+    def _fetch_inventory_weave(self, revs, pb):
         self.helper.regenerate_inventory(revs)

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2007-08-16 04:42:31 +0000
+++ b/bzrlib/repository.py	2007-08-16 05:50:11 +0000
@@ -731,6 +731,57 @@
             pb.finished()
         return result
 
+    def item_keys_introduced_by(self, revision_ids, _files_pb=None):
+        """Get an iterable listing the keys of all the data introduced by a set
+        of revision IDs.
+
+        The keys will be ordered so that the corresponding items can be safely
+        fetched and inserted in that order.
+
+        :returns: An iterable producing tuples of (knit-kind, file-id,
+            versions).  knit-kind is one of 'file', 'inventory', 'signatures',
+            'revisions'.  file-id is None unless knit-kind is 'file'.
+        """
+        # XXX: it's a bit weird to control the inventory weave caching in this
+        # generator.  Ideally the caching would be done in fetch.py I think.  Or
+        # maybe this generator should explicitly have the contract that it
+        # should not be iterated until the previously yielded item has been
+        # processed?
+        inv_w = self.get_inventory_weave()
+        inv_w.enable_cache()
+
+        # file ids that changed
+        file_ids = self.fileids_altered_by_revision_ids(revision_ids)
+        count = 0
+        num_file_ids = len(file_ids)
+        for file_id, altered_versions in file_ids.iteritems():
+            if _files_pb is not None:
+                _files_pb.update("fetch texts", count, num_file_ids)
+            count += 1
+            yield ("file", file_id, altered_versions)
+        # We're done with the files_pb.  Note that it finished by the caller,
+        # just as it was created by the caller.
+        del _files_pb
+
+        # inventory
+        yield ("inventory", None, revision_ids)
+        inv_w.clear_cache()
+
+        # signatures
+        revisions_with_signatures = set()
+        for rev_id in revision_ids:
+            try:
+                self.get_signature_text(rev_id)
+            except errors.NoSuchRevision:
+                # not signed.
+                pass
+            else:
+                revisions_with_signatures.add(rev_id)
+        yield ("signatures", None, revisions_with_signatures)
+
+        # revisions
+        yield ("revisions", None, revision_ids)
+
     @needs_read_lock
     def get_inventory_weave(self):
         return self.control_weaves.get_weave('inventory',

=== modified file 'bzrlib/tests/repository_implementations/test_repository.py'
--- a/bzrlib/tests/repository_implementations/test_repository.py	2007-08-07 22:59:45 +0000
+++ b/bzrlib/tests/repository_implementations/test_repository.py	2007-08-14 03:40:57 +0000
@@ -210,6 +210,21 @@
         rev2_tree = knit3_repo.revision_tree('rev2')
         self.assertEqual('rev1', rev2_tree.inventory.root.revision)
 
+    def makeARepoWithSignatures(self):
+        wt = self.make_branch_and_tree('a-repo-with-sigs')
+        wt.commit('rev1', allow_pointless=True, rev_id='rev1')
+        repo = wt.branch.repository
+        repo.sign_revision('rev1', bzrlib.gpg.LoopbackGPGStrategy(None))
+        return repo
+
+    def test_fetch_copies_signatures(self):
+        source_repo = self.makeARepoWithSignatures()
+        target_repo = self.make_repository('target')
+        target_repo.fetch(source_repo, revision_id=None)
+        self.assertEqual(
+            source_repo.get_signature_text('rev1'),
+            target_repo.get_signature_text('rev1'))
+
     def test_get_revision_delta(self):
         tree_a = self.make_branch_and_tree('a')
         self.build_tree(['a/foo'])




More information about the bazaar-commits mailing list