nested trees design-approach : composite trees vs iter_changes

Wed May 6 18:28:49 BST 2009

On Wed, 06 May 2009 14:29:45 +1000
Robert Collins <robert.collins at canonical.com> wrote:

> On Tue, 2009-05-05 at 10:19 -0400, Aaron Bentley wrote:
> 
> > I've heard your objections.  They don't resonate with me.  You haven't
> > vetoed, so I believed that you would let me get on with it, even though
> > it's not your preferred approach.
> > 
> > Instead, you have been undermining me.  I don't like being vetoed, but
> > it is better than being undermined.  It is at least direct.  It
> > demonstrates that your objections are strong enough that you're willing
> > to derail the current development over them, and that's important data.
> 
> Concerns that aren't addressed rarely just go away; trying to get it
> discussed wasn't intended to be undermining, but an attempt to get it
> discussed!
> 
> > .. so I've rejected in BB, and this mail is
> > > hopefully sufficient to get discussion going.
> > > 
> > > CompositeTree worries me for a number of reasons. 
> > > 
> > >  * Our behaviour will change in unexpected ways depending on whether
> > >    a CompositeTree is being used.
> > 
> > This seems entirely spurious to me.  When you get exactly what you ask
> > for, it cannot be unexpected behaviour.  CompositeTree is a shim for
> > supporting certain commands which have a shallow view of Trees, and
> > therefore don't need to be aware of tree references.  Commands like
> > diff, status, ls, annotate, export.
> > 
> > It is an attempt to get coverage of a bunch of commands with minimal
> > cost.  It is a stepping stone, not a long-term solution.
> 
> What is the long term solution? http://bazaar-vcs.org/NestedTreesDesign
> doesn't talk about other options.
> 
> 
> > >  * Because of that we're going to end up using CT everywhere, always.
> > 
> > No, that would be a disaster.  Many commands need to know where the
> > subtrees are, and CT masks this.
> 
> What commands don't?
> 
> > >  * Because CT is separate to the tree objects, we appear to need a new
> > >    index of nested trees for performance, which is redundant data. Its
> > >    not necessarily wrong to have it; but having to have it to have the
> > >    feature work at all is rather concerning.
> > 
> > As you yourself note, it's for performance.  The code works perfectly
> > without such an index.  For small trees, it probably doesn't need an index.
> > 
> > I believe that this index is fundamentally necessary.  When performing
> > an operation that recurses into subtrees, we should lock those subtrees
> > before beginning the operation.  Operations should not fail due to lock
> > errors when partially complete.
> 
> Operations can run into locking problems at any point - see for instance
> files open in a text editor on win32. I think its nice to avoid that
> where we can, but as its something that can't be avoided completely, we
> should consider carefully the cost of avoiding it. And note that
> repositories do their physical locking very late in tasks these days -
> and that has worked well, with less resource contention on shared
> repositories.

Working with Continuus as a build manager, I have run against this in
several instances.

What I really object to is that (taking Continuus again as an example) it is probably difficult to execute one operation on a whole tree as a transaction with commit or rollback options. We have a source tree of 250 nested trees (in bzr parlance), and if something goes wrong the system is left in an inconsistent state which needs to be cleaned up with a mix of manual and command line operations. Nested trees is a good, and sometimes even necessary thing (I would not want to use a system which does not support something similar for large projects), but I think that this is the criterion to uphold : if something goes wrong in tree operation which can leave the tree in an inconsistent state, make it easy to clean up or restart.

Regards,

Jurgen