[MERGE] Support fetching a restricted set of revisions

Jelmer Vernooij jelmer at samba.org
Thu Oct 25 13:49:05 BST 2007


Hi Aaron,

Thanks for the feedback.

On Tue, 2007-07-17 at 10:17 -0400, Aaron Bentley wrote:
> Jelmer Vernooij wrote:
> > What the -rX..Y code (which I haven't submitted yet) does is create
> a
> > bread-first search over all the ancestors of Y until it hits X
> > (bzrlib.graph was very useful in that regard). In other words, it
> will
> > fetch all ancestors of Y that have a distance equal to or less than 
> that
> > of X to Y.
> Why not just select the ancestors of Y that are not ancestors of X? 
> That seems like a simpler definition, and should behave similarly
> most 
> of the time.  Also, much more likely to be reused.
That is potentially a large set of data. A revision Z between X and Y
can have an ancestry that is potentially very large.

Also, determining the set difference would require knowledge of the full
graph - the current behaviour requires just a local search. 

But let's leave that discussion to the next merge request - this one
doesn't actually contain that code.

> > @@ -280,7 +286,11 @@
> >              to_transaction)
>  >          from_rf = 
> self.from_repository._revision_store.get_revision_file(
> >              from_transaction)
> > -        to_rf.join(from_rf, version_ids=revs)
> > +        if self._limit_to_revisions is not None:
> > +            to_rf.join(from_rf, version_ids=self._limit_to_revisions,
> > +                       restrict_revisions=True)
> I don't like this very much, style-wise.  Changing the meaning if a 
> parameter based on another parameter doesn't seem like good form.
> 
> What would be wrong with adding a limit_to_revisions parameter to 
> versionedfile.join?
We're already specifying a versions parameter with the versions that have to be retrieved. join() will also fetch more 
than just those revisions - it will fetch all the revisions required to create those revisions, so it will not necessarily 
limit. restrict_revisions basically just says: just fetch specified revisions, nothing more (usually: not ancestry). 
This 

> > -            self.source_ancestry = 
> set(self.source.get_ancestry(version_ids))
> > +            if restrict_revisions:
> > +                self.source_ancestry = set(
>  > + 
> self.source._get_components_positions(version_ids).keys())
> 
> ^^^ Are you ensuring that every revision that is present has all its 
> inventories and file texts present also?  It doesn't look like that to 
> me.
The same mechanism as before is still used - inventories and texts are still fetched greedily. My next merge request limits the amount of data fixed for them as well.

> By including the build-dependencies of the desired revisions, you're 
> including the revisions themselves, and that means their inventories and 
> file texts must also be included.
The revisions knit doesn't contain deltas - only fulltexts and so there are no build-dependencies that can ever be pulled.

> I think it would be simpler to create a snapshots of the oldest 
> revisions.
That means making assumptions about the set of revisions you're fetching. In theory limit_to_revisions could be very sparse. Also, you'd have to do some sort of ordering to find the oldest revisions, something which is not necessary in the current code.

> The only reason your later test cases are passing is because revisions 
> *happen to be* all snapshots, and therefore have no build dependencies. 
> This is not true for inventories, however.
They're snapshots on purpose - and partial cloning will correctly fail if the revisions happen to be stored in a delta knit.

> > +            else:
>  > +                self.source_ancestry = 
> set(self.source.get_ancestry(version_ids))
> 
> ^^^ I think that should be get_ancestry(topo_sorted=False).  I don't 
> know how I missed it originally.
Not quite sure I follow. Do you mean sorting isn't necessary or that self.get_ancestry() should be used?

> > @@ -1724,12 +1735,14 @@
> >              return False
> >
> >      @needs_write_lock
> > -    def fetch(self, revision_id=None, pb=None):
>  > +    def fetch(self, revision_id=None, pb=None, 
> limit_to_revisions=None):
> >          """See InterRepository.fetch()."""
> >          from bzrlib.fetch import Knit1to2Fetcher
> >          mutter("Using fetch logic to copy between %s(%s) and %s(%s)",
> >                 self.source, self.source._format, self.target,
> >                 self.target._format)
> > +        if limit_to_revisions is not None:
> > +            raise errors.PartialCloningNotSupported(self)
> ^^^ You have a lot of repetitions of this.  Have you looked at factoring 
> it out?
Thanks, done that.

> > @@ -131,7 +131,8 @@
> > +    def test_sprout_bzrdir_partial(self):
> > +        tree = self.make_branch_and_tree('source')
>  > +        self.build_tree(['foo'], 
> transport=tree.bzrdir.transport.clone('..'))
> > +        tree.add('foo')
> > +        rev1 = tree.commit('revision 1')
> > +        tree.bzrdir.transport.clone('..').put_bytes('foo', 'bar')
> > +        rev2 = tree.commit('revision 2')
> > +        dir = tree.bzrdir
> > +        try:
> > +            target = self.sproutOrSkip(dir, self.get_url('target'),
> > +                    limit_to_revisions=set([rev2]))
> > +        except errors.PartialCloningNotSupported:
>  > +            return # Not all repository formats can support partial 
> cloning
> It's poor form to return from a test case if the test has not succeeded. 
> Raising TestSkipped (i.e. from sproutOrSkip) would be much better
There doesn't seem to be consensus about this. IIRC Robert mentioned preferring here because TestSkipped suggested that the test would be fixable later on.

> > +        repository = target.open_repository()
> > +        self.assertTrue(repository.has_revision(rev2))
> > +        self.assertFalse(repository.has_revision(rev1))
> 
> I would like to see proof that
> 1. repository.revision_tree(rev1) is not present (the inventory may only 
> be present if all the texts it references are present, and I doubt we 
> want that.)
> 2. all of the file texts for revision_tree(rev2) are present.
Thanks, will add those.

> > +    def test_sprout_bzrdir_partial_subtrees(self):
> > +        tree = self.make_branch_and_tree('source')
> > +        subtree = self.make_branch_and_tree('source/subtree')
> > +        self.build_tree(['source/subtree/file'])
> > +        subtree.add('file')
> > +        subrev1 = subtree.commit('subrevision 1')
> > +        try:
> > +            tree.add_reference(subtree)
>  > +        except (errors.BadReferenceTarget, 
> errors.UnsupportedOperation):
> > +            # this test is for formats that support subtrees
> > +            return
> You can just check format.repository_format._supports_tree_references.
> 
> > +        rev1 = tree.commit('revision 1')
> > +        subtree.bzrdir.transport.clone('..').put_bytes('file', 'bar')
> > +        subrev2 = subtree.commit('subrevision 2')
> > +        rev2 = tree.commit('revision 2')
> > +        dir = tree.bzrdir
> > +        try:
> > +            target = self.sproutOrSkip(dir, self.get_url('target'),
> > +                    limit_to_revisions=set([rev2]))
> > +        except errors.PartialCloningNotSupported:
>  > +            return # Not all repository formats can support partial 
> cloning
> 
> ^^^ I'm not aware of any repository formats that support subtrees but 
> not partial cloning.  Are you?
Subversion supports subtrees but not (yet) partial cloning. Also,
KnitRepositoryFormat3 doesn't support partial cloning at this point, or
at least there aren't tests that make sure it does.

Cheers,

Jelmer
-- 
Jelmer Vernooij <jelmer at samba.org> - http://samba.org/~jelmer/
Jabber: jelmer at jabber.fsfe.org



More information about the bazaar mailing list