Rev 2713: (Andrew Bennetts) Add Repository.item_keys_introduced_by, and some associated refactoring of bzrlib/fetch.py. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Thu Aug 16 06:50:17 BST 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 2713
revision-id: pqm at pqm.ubuntu.com-20070816055011-tnjdmdi948uyvz6a
parent: pqm at pqm.ubuntu.com-20070816044231-k9pvlics7hlhxuw5
parent: andrew.bennetts at canonical.com-20070814034057-8ri53nse7y9h9mjy
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2007-08-16 06:50:11 +0100
message:
(Andrew Bennetts) Add Repository.item_keys_introduced_by, and some associated refactoring of bzrlib/fetch.py.
modified:
bzrlib/fetch.py fetch.py-20050818234941-26fea6105696365d
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
------------------------------------------------------------
revno: 2668.2.9
merged: andrew.bennetts at canonical.com-20070814034057-8ri53nse7y9h9mjy
parent: andrew.bennetts at canonical.com-20070808062316-1baiqpdwfzvznju1
parent: pqm at pqm.ubuntu.com-20070813221757-bianevqddds8ift5
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Tue 2007-08-14 13:40:57 +1000
message:
Merge from bzr.dev
------------------------------------------------------------
revno: 2668.2.8
merged: andrew.bennetts at canonical.com-20070808062316-1baiqpdwfzvznju1
parent: andrew.bennetts at canonical.com-20070808012954-xcsz5ucoomy3f5c4
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Wed 2007-08-08 16:23:16 +1000
message:
Rename get_data_to_fetch_for_revision_ids as item_keys_introduced_by.
------------------------------------------------------------
revno: 2668.2.7
merged: andrew.bennetts at canonical.com-20070808012954-xcsz5ucoomy3f5c4
parent: andrew.bennetts at canonical.com-20070808004014-ty754dinakzia4rf
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Wed 2007-08-08 11:29:54 +1000
message:
Use bzrlib.revision.is_null rather than comparing against NULL_REVISION.
------------------------------------------------------------
revno: 2668.2.6
merged: andrew.bennetts at canonical.com-20070808004014-ty754dinakzia4rf
parent: andrew.bennetts at canonical.com-20070808003903-mizhhxhm5q0ofvt6
parent: andrew.bennetts at canonical.com-20070807074727-5vmpk09r98lyef00
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Wed 2007-08-08 10:40:14 +1000
message:
Merge repository-equality.
------------------------------------------------------------
revno: 2668.2.5
merged: andrew.bennetts at canonical.com-20070808003903-mizhhxhm5q0ofvt6
parent: andrew.bennetts at canonical.com-20070806053243-myehcu0fvyrxpkyb
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Wed 2007-08-08 10:39:03 +1000
message:
Rename get_data_about_revision_ids to get_data_to_fetch_for_revision_ids.
------------------------------------------------------------
revno: 2668.2.4
merged: andrew.bennetts at canonical.com-20070806053243-myehcu0fvyrxpkyb
parent: andrew.bennetts at canonical.com-20070806051607-5d1g6hxd69dyd20u
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Mon 2007-08-06 15:32:43 +1000
message:
Get rid of RepoFetcher._same_repo by using Repository.__eq__.
------------------------------------------------------------
revno: 2668.2.3
merged: andrew.bennetts at canonical.com-20070806051607-5d1g6hxd69dyd20u
parent: andrew.bennetts at canonical.com-20070806051558-r2hfumxcwgotrp6a
parent: andrew.bennetts at canonical.com-20070806051155-t570bk2i3gcnebwr
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Mon 2007-08-06 15:16:07 +1000
message:
Merge repository-equality.
------------------------------------------------------------
revno: 2668.2.2
merged: andrew.bennetts at canonical.com-20070806051558-r2hfumxcwgotrp6a
parent: andrew.bennetts at canonical.com-20070802080306-s81tmi8j90hie7qe
parent: pqm at pqm.ubuntu.com-20070803043116-l7u1uypblmx1uxnr
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Mon 2007-08-06 15:15:58 +1000
message:
Merge bzr.dev.
------------------------------------------------------------
revno: 2668.2.1
merged: andrew.bennetts at canonical.com-20070802080306-s81tmi8j90hie7qe
parent: pqm at pqm.ubuntu.com-20070802072205-gjk1eev6rlw7ght8
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: fetch-refactor
timestamp: Thu 2007-08-02 18:03:06 +1000
message:
Split out fetch refactoring from repo-refactor, adding Repository.get_data_about_revision_ids.
=== modified file 'bzrlib/fetch.py'
--- a/bzrlib/fetch.py 2007-08-15 04:33:34 +0000
+++ b/bzrlib/fetch.py 2007-08-16 05:50:11 +0000
@@ -36,7 +36,7 @@
from bzrlib.errors import (InstallFailed,
)
from bzrlib.progress import ProgressPhase
-from bzrlib.revision import NULL_REVISION
+from bzrlib.revision import is_null, NULL_REVISION
from bzrlib.symbol_versioning import (deprecated_function,
deprecated_method,
)
@@ -79,9 +79,10 @@
# result variables.
self.failed_revisions = []
self.count_copied = 0
- if to_repository.control_files._transport.base == from_repository.control_files._transport.base:
- # check that last_revision is in 'from' and then return a no-operation.
- if last_revision not in (None, NULL_REVISION):
+ if to_repository.has_same_location(from_repository):
+ # check that last_revision is in 'from' and then return a
+ # no-operation.
+ if last_revision is not None and not is_null(last_revision):
to_repository.get_revision(last_revision)
return
self.to_repository = to_repository
@@ -129,19 +130,63 @@
try:
pp.next_phase()
revs = self._revids_to_fetch()
- # something to do ?
- if revs:
- pp.next_phase()
- self._fetch_weave_texts(revs)
- pp.next_phase()
- self._fetch_inventory_weave(revs)
- pp.next_phase()
- self._fetch_revision_texts(revs)
- self.count_copied += len(revs)
+ self._fetch_everything_for_revisions(revs, pp)
finally:
self.pb.clear()
+ def _fetch_everything_for_revisions(self, revs, pp):
+ """Fetch all data for the given set of revisions."""
+ if revs is None:
+ return
+ # The first phase is "file". We pass the progress bar for it directly
+ # into item_keys_introduced_by, which has more information about how
+ # that phase is progressing than we do. Progress updates for the other
+ # phases are taken care of in this function.
+ # XXX: there should be a clear owner of the progress reporting. Perhaps
+ # item_keys_introduced_by should have a richer API than it does at the
+ # moment, so that it can feed the progress information back to this
+ # function?
+ phase = 'file'
+ pb = bzrlib.ui.ui_factory.nested_progress_bar()
+ try:
+ data_to_fetch = self.from_repository.item_keys_introduced_by(revs, pb)
+ for knit_kind, file_id, revisions in data_to_fetch:
+ if knit_kind != phase:
+ phase = knit_kind
+ # Make a new progress bar for this phase
+ pb.finished()
+ pp.next_phase()
+ pb = bzrlib.ui.ui_factory.nested_progress_bar()
+ if knit_kind == "file":
+ self._fetch_weave_text(file_id, revisions)
+ elif knit_kind == "inventory":
+ # XXX:
+ # Once we've processed all the files, then we generate the root
+ # texts (if necessary), then we process the inventory. It's a
+ # bit distasteful to have knit_kind == "inventory" mean this,
+ # perhaps it should happen on the first non-"file" knit, in case
+ # it's not always inventory?
+ self._generate_root_texts(revs)
+ self._fetch_inventory_weave(revs, pb)
+ elif knit_kind == "signatures":
+ # Nothing to do here; this will be taken care of when
+ # _fetch_revision_texts happens.
+ pass
+ elif knit_kind == "revisions":
+ self._fetch_revision_texts(revs, pb)
+ else:
+ raise AssertionError("Unknown knit kind %r" % knit_kind)
+ finally:
+ if pb is not None:
+ pb.finished()
+ self.count_copied += len(revs)
+
def _revids_to_fetch(self):
+ """Determines the exact revisions needed from self.from_repository to
+ install self._last_revision in self.to_repository.
+
+ If no revisions need to be fetched, then this just returns None.
+ """
mutter('fetch up to rev {%s}', self._last_revision)
if self._last_revision is NULL_REVISION:
# explicit limit of no revisions needed
@@ -156,65 +201,55 @@
except errors.NoSuchRevision:
raise InstallFailed([self._last_revision])
- def _fetch_weave_texts(self, revs):
- texts_pb = bzrlib.ui.ui_factory.nested_progress_bar()
- try:
- # fileids_altered_by_revision_ids requires reading the inventory
- # weave, we will need to read the inventory weave again when
- # all this is done, so enable caching for that specific weave
- inv_w = self.from_repository.get_inventory_weave()
- inv_w.enable_cache()
- file_ids = self.from_repository.fileids_altered_by_revision_ids(revs)
- count = 0
- num_file_ids = len(file_ids)
- for file_id, required_versions in file_ids.items():
- texts_pb.update("fetch texts", count, num_file_ids)
- count +=1
- to_weave = self.to_weaves.get_weave_or_empty(file_id,
- self.to_repository.get_transaction())
- from_weave = self.from_weaves.get_weave(file_id,
- self.from_repository.get_transaction())
- # we fetch all the texts, because texts do
- # not reference anything, and its cheap enough
- to_weave.join(from_weave, version_ids=required_versions)
- # we don't need *all* of this data anymore, but we dont know
- # what we do. This cache clearing will result in a new read
- # of the knit data when we do the checkout, but probably we
- # want to emit the needed data on the fly rather than at the
- # end anyhow.
- # the from weave should know not to cache data being joined,
- # but its ok to ask it to clear.
- from_weave.clear_cache()
- to_weave.clear_cache()
- finally:
- texts_pb.finished()
-
- def _fetch_inventory_weave(self, revs):
- pb = bzrlib.ui.ui_factory.nested_progress_bar()
- try:
- pb.update("fetch inventory", 0, 2)
- to_weave = self.to_control.get_weave('inventory',
- self.to_repository.get_transaction())
-
- child_pb = bzrlib.ui.ui_factory.nested_progress_bar()
- try:
- # just merge, this is optimisable and its means we don't
- # copy unreferenced data such as not-needed inventories.
- pb.update("fetch inventory", 1, 3)
- from_weave = self.from_repository.get_inventory_weave()
- pb.update("fetch inventory", 2, 3)
- # we fetch only the referenced inventories because we do not
- # know for unselected inventories whether all their required
- # texts are present in the other repository - it could be
- # corrupt.
- to_weave.join(from_weave, pb=child_pb, msg='merge inventory',
- version_ids=revs)
- from_weave.clear_cache()
- finally:
- child_pb.finished()
- finally:
- pb.finished()
-
+ def _fetch_weave_text(self, file_id, required_versions):
+ to_weave = self.to_weaves.get_weave_or_empty(file_id,
+ self.to_repository.get_transaction())
+ from_weave = self.from_weaves.get_weave(file_id,
+ self.from_repository.get_transaction())
+ # we fetch all the texts, because texts do
+ # not reference anything, and its cheap enough
+ to_weave.join(from_weave, version_ids=required_versions)
+ # we don't need *all* of this data anymore, but we dont know
+ # what we do. This cache clearing will result in a new read
+ # of the knit data when we do the checkout, but probably we
+ # want to emit the needed data on the fly rather than at the
+ # end anyhow.
+ # the from weave should know not to cache data being joined,
+ # but its ok to ask it to clear.
+ from_weave.clear_cache()
+ to_weave.clear_cache()
+
+ def _fetch_inventory_weave(self, revs, pb):
+ pb.update("fetch inventory", 0, 2)
+ to_weave = self.to_control.get_weave('inventory',
+ self.to_repository.get_transaction())
+
+ child_pb = bzrlib.ui.ui_factory.nested_progress_bar()
+ try:
+ # just merge, this is optimisable and its means we don't
+ # copy unreferenced data such as not-needed inventories.
+ pb.update("fetch inventory", 1, 3)
+ from_weave = self.from_repository.get_inventory_weave()
+ pb.update("fetch inventory", 2, 3)
+ # we fetch only the referenced inventories because we do not
+ # know for unselected inventories whether all their required
+ # texts are present in the other repository - it could be
+ # corrupt.
+ to_weave.join(from_weave, pb=child_pb, msg='merge inventory',
+ version_ids=revs)
+ from_weave.clear_cache()
+ finally:
+ child_pb.finished()
+
+ def _generate_root_texts(self, revs):
+ """This will be called by __fetch between fetching weave texts and
+ fetching the inventory weave.
+
+ Subclasses should override this if they need to generate root texts
+ after fetching weave texts.
+ """
+ pass
+
class GenericRepoFetcher(RepoFetcher):
"""This is a generic repo to repo fetcher.
@@ -223,37 +258,29 @@
It triggers a reconciliation after fetching to ensure integrity.
"""
- def _fetch_revision_texts(self, revs):
+ def _fetch_revision_texts(self, revs, pb):
"""Fetch revision object texts"""
- rev_pb = bzrlib.ui.ui_factory.nested_progress_bar()
- try:
- to_txn = self.to_transaction = self.to_repository.get_transaction()
- count = 0
- total = len(revs)
- to_store = self.to_repository._revision_store
- for rev in revs:
- pb = bzrlib.ui.ui_factory.nested_progress_bar()
- try:
- pb.update('copying revisions', count, total)
- try:
- sig_text = self.from_repository.get_signature_text(rev)
- to_store.add_revision_signature_text(rev, sig_text, to_txn)
- except errors.NoSuchRevision:
- # not signed.
- pass
- to_store.add_revision(self.from_repository.get_revision(rev),
- to_txn)
- count += 1
- finally:
- pb.finished()
- # fixup inventory if needed:
- # this is expensive because we have no inverse index to current ghosts.
- # but on local disk its a few seconds and sftp push is already insane.
- # so we just-do-it.
- # FIXME: repository should inform if this is needed.
- self.to_repository.reconcile()
- finally:
- rev_pb.finished()
+ to_txn = self.to_transaction = self.to_repository.get_transaction()
+ count = 0
+ total = len(revs)
+ to_store = self.to_repository._revision_store
+ for rev in revs:
+ pb.update('copying revisions', count, total)
+ try:
+ sig_text = self.from_repository.get_signature_text(rev)
+ to_store.add_revision_signature_text(rev, sig_text, to_txn)
+ except errors.NoSuchRevision:
+ # not signed.
+ pass
+ to_store.add_revision(self.from_repository.get_revision(rev),
+ to_txn)
+ count += 1
+ # fixup inventory if needed:
+ # this is expensive because we have no inverse index to current ghosts.
+ # but on local disk its a few seconds and sftp push is already insane.
+ # so we just-do-it.
+ # FIXME: repository should inform if this is needed.
+ self.to_repository.reconcile()
class KnitRepoFetcher(RepoFetcher):
@@ -264,7 +291,7 @@
copy revision texts.
"""
- def _fetch_revision_texts(self, revs):
+ def _fetch_revision_texts(self, revs, pb):
# may need to be a InterRevisionStore call here.
from_transaction = self.from_repository.get_transaction()
to_transaction = self.to_repository.get_transaction()
@@ -354,12 +381,10 @@
GenericRepoFetcher.__init__(self, to_repository, from_repository,
last_revision, pb)
- def _fetch_weave_texts(self, revs):
- GenericRepoFetcher._fetch_weave_texts(self, revs)
- # Now generate a weave for the tree root
+ def _generate_root_texts(self, revs):
self.helper.generate_root_texts(revs)
- def _fetch_inventory_weave(self, revs):
+ def _fetch_inventory_weave(self, revs, pb):
self.helper.regenerate_inventory(revs)
@@ -372,10 +397,8 @@
KnitRepoFetcher.__init__(self, to_repository, from_repository,
last_revision, pb)
- def _fetch_weave_texts(self, revs):
- KnitRepoFetcher._fetch_weave_texts(self, revs)
- # Now generate a weave for the tree root
+ def _generate_root_texts(self, revs):
self.helper.generate_root_texts(revs)
- def _fetch_inventory_weave(self, revs):
+ def _fetch_inventory_weave(self, revs, pb):
self.helper.regenerate_inventory(revs)
=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py 2007-08-16 04:42:31 +0000
+++ b/bzrlib/repository.py 2007-08-16 05:50:11 +0000
@@ -731,6 +731,57 @@
pb.finished()
return result
+ def item_keys_introduced_by(self, revision_ids, _files_pb=None):
+ """Get an iterable listing the keys of all the data introduced by a set
+ of revision IDs.
+
+ The keys will be ordered so that the corresponding items can be safely
+ fetched and inserted in that order.
+
+ :returns: An iterable producing tuples of (knit-kind, file-id,
+ versions). knit-kind is one of 'file', 'inventory', 'signatures',
+ 'revisions'. file-id is None unless knit-kind is 'file'.
+ """
+ # XXX: it's a bit weird to control the inventory weave caching in this
+ # generator. Ideally the caching would be done in fetch.py I think. Or
+ # maybe this generator should explicitly have the contract that it
+ # should not be iterated until the previously yielded item has been
+ # processed?
+ inv_w = self.get_inventory_weave()
+ inv_w.enable_cache()
+
+ # file ids that changed
+ file_ids = self.fileids_altered_by_revision_ids(revision_ids)
+ count = 0
+ num_file_ids = len(file_ids)
+ for file_id, altered_versions in file_ids.iteritems():
+ if _files_pb is not None:
+ _files_pb.update("fetch texts", count, num_file_ids)
+ count += 1
+ yield ("file", file_id, altered_versions)
+ # We're done with the files_pb. Note that it finished by the caller,
+ # just as it was created by the caller.
+ del _files_pb
+
+ # inventory
+ yield ("inventory", None, revision_ids)
+ inv_w.clear_cache()
+
+ # signatures
+ revisions_with_signatures = set()
+ for rev_id in revision_ids:
+ try:
+ self.get_signature_text(rev_id)
+ except errors.NoSuchRevision:
+ # not signed.
+ pass
+ else:
+ revisions_with_signatures.add(rev_id)
+ yield ("signatures", None, revisions_with_signatures)
+
+ # revisions
+ yield ("revisions", None, revision_ids)
+
@needs_read_lock
def get_inventory_weave(self):
return self.control_weaves.get_weave('inventory',
=== modified file 'bzrlib/tests/repository_implementations/test_repository.py'
--- a/bzrlib/tests/repository_implementations/test_repository.py 2007-08-07 22:59:45 +0000
+++ b/bzrlib/tests/repository_implementations/test_repository.py 2007-08-14 03:40:57 +0000
@@ -210,6 +210,21 @@
rev2_tree = knit3_repo.revision_tree('rev2')
self.assertEqual('rev1', rev2_tree.inventory.root.revision)
+ def makeARepoWithSignatures(self):
+ wt = self.make_branch_and_tree('a-repo-with-sigs')
+ wt.commit('rev1', allow_pointless=True, rev_id='rev1')
+ repo = wt.branch.repository
+ repo.sign_revision('rev1', bzrlib.gpg.LoopbackGPGStrategy(None))
+ return repo
+
+ def test_fetch_copies_signatures(self):
+ source_repo = self.makeARepoWithSignatures()
+ target_repo = self.make_repository('target')
+ target_repo.fetch(source_repo, revision_id=None)
+ self.assertEqual(
+ source_repo.get_signature_text('rev1'),
+ target_repo.get_signature_text('rev1'))
+
def test_get_revision_delta(self):
tree_a = self.make_branch_and_tree('a')
self.build_tree(['a/foo'])
More information about the bazaar-commits
mailing list