nested trees design-approach : composite trees vs iter_changes

Jurgen Defurne jurgen.defurne at pandora.be
Wed May 6 18:28:49 BST 2009


On Wed, 06 May 2009 14:29:45 +1000
Robert Collins <robert.collins at canonical.com> wrote:

> On Tue, 2009-05-05 at 10:19 -0400, Aaron Bentley wrote:
> 
> > I've heard your objections.  They don't resonate with me.  You haven't
> > vetoed, so I believed that you would let me get on with it, even though
> > it's not your preferred approach.
> > 
> > Instead, you have been undermining me.  I don't like being vetoed, but
> > it is better than being undermined.  It is at least direct.  It
> > demonstrates that your objections are strong enough that you're willing
> > to derail the current development over them, and that's important data.
> 
> Concerns that aren't addressed rarely just go away; trying to get it
> discussed wasn't intended to be undermining, but an attempt to get it
> discussed!
> 
> > .. so I've rejected in BB, and this mail is
> > > hopefully sufficient to get discussion going.
> > > 
> > > CompositeTree worries me for a number of reasons. 
> > > 
> > >  * Our behaviour will change in unexpected ways depending on whether
> > >    a CompositeTree is being used.
> > 
> > This seems entirely spurious to me.  When you get exactly what you ask
> > for, it cannot be unexpected behaviour.  CompositeTree is a shim for
> > supporting certain commands which have a shallow view of Trees, and
> > therefore don't need to be aware of tree references.  Commands like
> > diff, status, ls, annotate, export.
> > 
> > It is an attempt to get coverage of a bunch of commands with minimal
> > cost.  It is a stepping stone, not a long-term solution.
> 
> What is the long term solution? http://bazaar-vcs.org/NestedTreesDesign
> doesn't talk about other options.
> 
> 
> > >  * Because of that we're going to end up using CT everywhere, always.
> > 
> > No, that would be a disaster.  Many commands need to know where the
> > subtrees are, and CT masks this.
> 
> What commands don't?
> 
> > >  * Because CT is separate to the tree objects, we appear to need a new
> > >    index of nested trees for performance, which is redundant data. Its
> > >    not necessarily wrong to have it; but having to have it to have the
> > >    feature work at all is rather concerning.
> > 
> > As you yourself note, it's for performance.  The code works perfectly
> > without such an index.  For small trees, it probably doesn't need an index.
> > 
> > I believe that this index is fundamentally necessary.  When performing
> > an operation that recurses into subtrees, we should lock those subtrees
> > before beginning the operation.  Operations should not fail due to lock
> > errors when partially complete.
> 
> Operations can run into locking problems at any point - see for instance
> files open in a text editor on win32. I think its nice to avoid that
> where we can, but as its something that can't be avoided completely, we
> should consider carefully the cost of avoiding it. And note that
> repositories do their physical locking very late in tasks these days -
> and that has worked well, with less resource contention on shared
> repositories.

Working with Continuus as a build manager, I have run against this in
several instances.

What I really object to is that (taking Continuus again as an example) it is probably difficult to execute one operation on a whole tree as a transaction with commit or rollback options. We have a source tree of 250 nested trees (in bzr parlance), and if something goes wrong the system is left in an inconsistent state which needs to be cleaned up with a mix of manual and command line operations. Nested trees is a good, and sometimes even necessary thing (I would not want to use a system which does not support something similar for large projects), but I think that this is the criterion to uphold : if something goes wrong in tree operation which can leave the tree in an inconsistent state, make it easy to clean up or restart.

Regards,

Jurgen



More information about the bazaar mailing list