[MERGE] Support fetching a restricted set of revisions

Tue Jul 17 15:17:58 BST 2007

Aaron Bentley has voted +0.
Status is now: Waiting
Comment:
Jelmer Vernooij wrote:
> What the -rX..Y code (which I haven't submitted yet) does is create a
> bread-first search over all the ancestors of Y until it hits X
> (bzrlib.graph was very useful in that regard). In other words, it will
 > fetch all ancestors of Y that have a distance equal to or less than 
that
> of X to Y.

Why not just select the ancestors of Y that are not ancestors of X? 
That seems like a simpler definition, and should behave similarly most 
of the time.  Also, much more likely to be reused.

 > === modified file bzrlib/fetch.py // 
last-changed:jelmer at samba.org-200707131714
> ... 44-00gl8k8t8j43rl3m
> --- bzrlib/fetch.py
> +++ bzrlib/fetch.py
> @@ -267,6 +267,12 @@
>      copy revision texts.
>      """
>
 > +    def __init__(self, to_repository, from_repository, 
last_revision=None, pb=None,
> +                 limit_to_revisions=None):
> +        self._limit_to_revisions = limit_to_revisions
 > +        super(KnitRepoFetcher, self).__init__(to_repository, 
from_repository,
> +            last_revision, pb)
> +
>      def _fetch_revision_texts(self, revs):
>          # may need to be a InterRevisionStore call here.
>          from_transaction = self.from_repository.get_transaction()
> @@ -280,7 +286,11 @@
>              to_transaction)
 >          from_rf = 
self.from_repository._revision_store.get_revision_file(
>              from_transaction)
> -        to_rf.join(from_rf, version_ids=revs)
> +        if self._limit_to_revisions is not None:
> +            to_rf.join(from_rf, version_ids=self._limit_to_revisions,
> +                       restrict_revisions=True)

I don't like this very much, style-wise.  Changing the meaning if a 
parameter based on another parameter doesn't seem like good form.

What would be wrong with adding a limit_to_revisions parameter to 
versionedfile.join?

>
 > -            self.source_ancestry = 
set(self.source.get_ancestry(version_ids))
> +            if restrict_revisions:
> +                self.source_ancestry = set(
 > + 
self.source._get_components_positions(version_ids).keys())

^^^ Are you ensuring that every revision that is present has all its 
inventories and file texts present also?  It doesn't look like that to 
me.

By including the build-dependencies of the desired revisions, you're 
including the revisions themselves, and that means their inventories and 
file texts must also be included.

I think it would be simpler to create a snapshots of the oldest 
revisions.

The only reason your later test cases are passing is because revisions 
*happen to be* all snapshots, and therefore have no build dependencies. 
This is not true for inventories, however.

> +            else:
 > +                self.source_ancestry = 
set(self.source.get_ancestry(version_ids))

^^^ I think that should be get_ancestry(topo_sorted=False).  I don't 
know how I missed it originally.

> @@ -1724,12 +1735,14 @@
>              return False
>
>      @needs_write_lock
> -    def fetch(self, revision_id=None, pb=None):
 > +    def fetch(self, revision_id=None, pb=None, 
limit_to_revisions=None):
>          """See InterRepository.fetch()."""
>          from bzrlib.fetch import Knit1to2Fetcher
>          mutter("Using fetch logic to copy between %s(%s) and %s(%s)",
>                 self.source, self.source._format, self.target,
>                 self.target._format)
> +        if limit_to_revisions is not None:
> +            raise errors.PartialCloningNotSupported(self)

^^^ You have a lot of repetitions of this.  Have you looked at factoring 
it out?

 > === modified file bzrlib/tests/bzrdir_implementations/test_bzrdir.py 
// last-ch
> ... anged:jelmer at samba.org-20070714083247-wael1kydpl431tyc
> --- bzrlib/tests/bzrdir_implementations/test_bzrdir.py
> +++ bzrlib/tests/bzrdir_implementations/test_bzrdir.py
> @@ -123,7 +123,7 @@
>                                % a_bzrdir.transport)
>
>      def sproutOrSkip(self, from_bzrdir, to_url, revision_id=None,
> -                     force_new_repo=False):
> +                     force_new_repo=False, limit_to_revisions=None):
>          """Sprout from_bzrdir into to_url, or raise TestSkipped.
>
 >          A simple wrapper for from_bzrdir.sprout that translates 
NotLocalUrl into

^^^ It seems like it would make sense to extend sproutOrSkip to handle
PartialCloningNotSupported

> @@ -131,7 +131,8 @@
> +    def test_sprout_bzrdir_partial(self):
> +        tree = self.make_branch_and_tree('source')
 > +        self.build_tree(['foo'], 
transport=tree.bzrdir.transport.clone('..'))
> +        tree.add('foo')
> +        rev1 = tree.commit('revision 1')
> +        tree.bzrdir.transport.clone('..').put_bytes('foo', 'bar')
> +        rev2 = tree.commit('revision 2')
> +        dir = tree.bzrdir
> +        try:
> +            target = self.sproutOrSkip(dir, self.get_url('target'),
> +                    limit_to_revisions=set([rev2]))
> +        except errors.PartialCloningNotSupported:
 > +            return # Not all repository formats can support partial 
cloning

It's poor form to return from a test case if the test has not succeeded. 
Raising TestSkipped (i.e. from sproutOrSkip) would be much better

> +        repository = target.open_repository()
> +        self.assertTrue(repository.has_revision(rev2))
> +        self.assertFalse(repository.has_revision(rev1))

I would like to see proof that
1. repository.revision_tree(rev1) is not present (the inventory may only 
be present if all the texts it references are present, and I doubt we 
want that.)
2. all of the file texts for revision_tree(rev2) are present.

> +    def test_sprout_bzrdir_partial_existing(self):
> +        """Test partial sprouting into an existing repository."""
> +        tree = self.make_branch_and_tree('source')
 > +        self.build_tree(['foo'], 
transport=tree.bzrdir.transport.clone('..'))
> +        tree.add('foo')
> +        rev1 = tree.commit('revision 1')
> +        tree.bzrdir.transport.clone('..').put_bytes('foo', 'bar')
> +        rev2 = tree.commit('revision 2')
> +        get_transport(self.get_url()).mkdir('target')
> +        t = get_transport(self.get_url('target'))
> +        made_control = self.bzrdir_format.initialize(t.base)
> +        try:
> +            repository = made_control.create_repository(shared=True)
> +        except errors.IncompatibleFormat:
> +            return # Test only applies to shared repositories

Again, please don't return.

> +    def test_sprout_bzrdir_partial_subtrees(self):
> +        tree = self.make_branch_and_tree('source')
> +        subtree = self.make_branch_and_tree('source/subtree')
> +        self.build_tree(['source/subtree/file'])
> +        subtree.add('file')
> +        subrev1 = subtree.commit('subrevision 1')
> +        try:
> +            tree.add_reference(subtree)
 > +        except (errors.BadReferenceTarget, 
errors.UnsupportedOperation):
> +            # this test is for formats that support subtrees
> +            return

You can just check format.repository_format._supports_tree_references.

> +        rev1 = tree.commit('revision 1')
> +        subtree.bzrdir.transport.clone('..').put_bytes('file', 'bar')
> +        subrev2 = subtree.commit('subrevision 2')
> +        rev2 = tree.commit('revision 2')
> +        dir = tree.bzrdir
> +        try:
> +            target = self.sproutOrSkip(dir, self.get_url('target'),
> +                    limit_to_revisions=set([rev2]))
> +        except errors.PartialCloningNotSupported:
 > +            return # Not all repository formats can support partial 
cloning

^^^ I'm not aware of any repository formats that support subtrees but 
not partial cloning.  Are you?

Aaron

For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C1184407610.4759.21.camel%40ganieda.vernstok.nl%3E