nested trees design-approach : composite trees vs iter_changes

Robert Collins robert.collins at canonical.com
Wed May 6 05:29:45 BST 2009


On Tue, 2009-05-05 at 10:19 -0400, Aaron Bentley wrote:

> I've heard your objections.  They don't resonate with me.  You haven't
> vetoed, so I believed that you would let me get on with it, even though
> it's not your preferred approach.
> 
> Instead, you have been undermining me.  I don't like being vetoed, but
> it is better than being undermined.  It is at least direct.  It
> demonstrates that your objections are strong enough that you're willing
> to derail the current development over them, and that's important data.

Concerns that aren't addressed rarely just go away; trying to get it
discussed wasn't intended to be undermining, but an attempt to get it
discussed!

> .. so I've rejected in BB, and this mail is
> > hopefully sufficient to get discussion going.
> > 
> > CompositeTree worries me for a number of reasons. 
> > 
> >  * Our behaviour will change in unexpected ways depending on whether
> >    a CompositeTree is being used.
> 
> This seems entirely spurious to me.  When you get exactly what you ask
> for, it cannot be unexpected behaviour.  CompositeTree is a shim for
> supporting certain commands which have a shallow view of Trees, and
> therefore don't need to be aware of tree references.  Commands like
> diff, status, ls, annotate, export.
> 
> It is an attempt to get coverage of a bunch of commands with minimal
> cost.  It is a stepping stone, not a long-term solution.

What is the long term solution? http://bazaar-vcs.org/NestedTreesDesign
doesn't talk about other options.


> >  * Because of that we're going to end up using CT everywhere, always.
> 
> No, that would be a disaster.  Many commands need to know where the
> subtrees are, and CT masks this.

What commands don't?

> >  * Because CT is separate to the tree objects, we appear to need a new
> >    index of nested trees for performance, which is redundant data. Its
> >    not necessarily wrong to have it; but having to have it to have the
> >    feature work at all is rather concerning.
> 
> As you yourself note, it's for performance.  The code works perfectly
> without such an index.  For small trees, it probably doesn't need an index.
> 
> I believe that this index is fundamentally necessary.  When performing
> an operation that recurses into subtrees, we should lock those subtrees
> before beginning the operation.  Operations should not fail due to lock
> errors when partially complete.

Operations can run into locking problems at any point - see for instance
files open in a text editor on win32. I think its nice to avoid that
where we can, but as its something that can't be avoided completely, we
should consider carefully the cost of avoiding it. And note that
repositories do their physical locking very late in tasks these days -
and that has worked well, with less resource contention on shared
repositories.

> The need for an index is not exclusive to CompositeTrees.
> Branch.create_checkout, BzrDir.sprout, Merger._do_merge_to, and
> WorkingTree.revert all use it.  So this is not an objection to
> CompositeTree-- it's an objection to my entire implementation strategy.

We have a lot of code which was designed or written with O(things) years
ago - and we've been updating our designs and concepts to avoid this
where possible, because its costly. nested trees has been lying fallow,
and hasn't been updated in this regard. I think its its entirely healthy
to look at it and ask the question.

> >  * Because trees won't be responsible for their own nested items, it 
> >    may/will be very tricky for bzr-svn to do the right thing with
> >    externals [it may need to represent them as something different
> >    again, though I'm not 100% sure about that].
> 
> I don't understand this.  I've read the Subversion externals
> documentation and consulted Karl Fogel.  There are model compatibility
> issues, but I know what they are.  Jelmer also hasn't raised anything.
> 
> Can you explain the kind of problem you forsee?

Jelmer has already posted saying I was jumping at shadows, strike that
concern vis-a-vis svn. I'm still worried about inversion of
responsibilities but its theoretical rather than practical.

> > There is another concern, which is more weakly coupled to CT itself. I
> > think that CT makes this harder to change in the future
> 
> I don't think so.  CT is meant to ease the transition into a future
> where nesting is a core feature.  When we have achieved that, we won't
> need CT anymore.

Ok, thats great to hear.

..
> > this perhaps not meant to be a long term constraint
> 
> It is meant to last until path tokens are implemented.  I have always
> described nested trees as functioning like a single big tree.  When path
> tokens permit multiple copies of a given tree to coexist in a by-value
> nested tree, they will also permit it in a by-reference tree.  We don't
> need two ways of achieving the same thing.

Ok, thats a good point.


> Your approach is not the only alternative to CompositeTrees.  Another
> approach is explicit high-level recursion, as I've done with revert,
> merge and commit.

I was under the impression that merge used CT, I'll look at it. commit
I'm well familiar with.

> >  *  we want all commands and features to be nested tree capable, so
> > everyone should be paying the cost - and the cost should be zero when a
> > tree has no nested children.
> 
> That would be nice, but how do we get there in small steps?  You're
> already objecting to the indexes that make it cheap to determine whether
> a tree has nested children.

I'm objecting to the algorithms that require knowing that up-front.

> >  * Its not hard to add 'recursive=True' default parameters to things
> >    like changes_from and iter_changes
> 
> No, what's hard is implementing the recursive behavior.
> 
> > , which would make recursion
> >    optional. Having to use a different class to enable/disable recursion
> >    is a pretty big hammer.
> 
> If you change iter_changes so that it automatically recurses, you're
> making a deep change to its behaviour.  You need to expose tree
> references, so that operations which care about tree references will be
> able to show them.  It might as well be a new API.
>
> Having another class that masks tree references means you don't have to
> rewrite 'diff' and 'status' on top of everything else.

As a transition strategy, sure. It was not at *all* clear that CT was a
transition strategy.

> >  * I don't think it hides whats going on any more or less than having a 
> >    separate class which pulls data out of all the children.
> 
> I usually find your code to be very, very unclear, so I do not place
> much stock in your opinion here.

Bringing a technical discussion down to ad hominems is rarely helpful.
At best it leaves people feeling defensive or offended.

I've seen you write beautiful code and awfully confusing code too.

> >  * We can address the all-or-nothing issue by making recursion
> >    controlling parameters default to False initially
> 
> Perhaps I expressed myself poorly.  I believe that it makes sense to
> start with relatively uninvasive changes.  They are faster to make and
> bear less risk of causing problems for others.  They allow us to
> experiment with approaches to nesting without committing fully to them
> by enshrining them in our public API.

Regardless of whether we put thing in our public API, people will start
using and depending on them as soon as the feature is vaguely usable in
bzr.dev. I agree that uninvasive changes are easier to mutate. The
problem that that has is that if an invasive change is whats actually
needed to get a good outcome, the absence of that change can and does
skew the code written until its made.

> I don't think that shoving recursion into Tree is a desirable outcome.
> Many operations will need to know that they are dealing with multiple
> trees.  I think we will wind up using a new API that works naturally in
> situations where there are nested subtrees.  I think NestedTrees is the
> start of such a thing.  Once we've got more recursion-aware code, we'll
> be able to refactor it.
> 
> We will also need a TreeTransform that can apply to all subtrees, so
> that merge and revert function perfectly.  It will need to handle cases
> where subtrees are created and destroyed, and where files are moved
> among subtrees.

I think most operations will want to recurse one way or another; you say
that Tree operations shouldn't know how to recurse [yet some already
do], and that CT is a transitional too. The only reference I see in
bzr.dev to NestedTrees is in
doc/en/user-guide/shared_repository_layouts.txt, which only links to the
wiki.

So if Tree shouldn't know how to recurse, yet MutableTree.commit does,
and CT is transitional - I'm well confused.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090506/0682b9e4/attachment-0001.pgp 


More information about the bazaar mailing list