nested trees design-approach : composite trees vs iter_changes

Tue May 5 00:32:54 BST 2009

I've been concerned about CompositeTree as an approach for a while, but
we haven't been able to actually get anywhere in discussing it; today
Aaron asked broadly that I either veto, or stop complaining. I hadn't
wanted to veto the vote because that seems to block people vs trying to
change their mind; however Aaron feels that this will force some
discussion to resolve it... so I've rejected in BB, and this mail is
hopefully sufficient to get discussion going.

CompositeTree worries me for a number of reasons. 

 * Our behaviour will change in unexpected ways depending on whether
   a CompositeTree is being used. For instance 'tree.add' with a
   specific fileid will error on duplicates when a CompositeTree is
   used, and not (with the same tree, same fileid) if the programmer
   forgets to use CompositeTree (CT from here on in).
 * Because of that we're going to end up using CT everywhere, always.
   Perhaps this is good, but if so we should change the factory objects
   that create working trees and revision trees, rather than having
   commands know to start using CT. As it stands people writing to and
   in our API will get it wrong all the time. (e.g. loggerhead, PQM,
   launchpad).
 * Because CT is separate to the tree objects, we appear to need a new
   index of nested trees for performance, which is redundant data. Its
   not necessarily wrong to have it; but having to have it to have the
   feature work at all is rather concerning.
 * Because trees won't be responsible for their own nested items, it 
   may/will be very tricky for bzr-svn to do the right thing with
   externals [it may need to represent them as something different
   again, though I'm not 100% sure about that].

There is another concern, which is more weakly coupled to CT itself. I
think that CT makes this harder to change in the future, which is why I
mention it now:

 * The CT design enforces 'file ids are unique' across all the nested 
   trees; this perhaps not meant to be a long term constraint, but it 
   is one that we can't enforce - unlike our normal behaviour users
   will be able to make bzr break, and it won't be clear why, or how
   they should fix it (and arguably they won't have done anything
   wrong that would need fixing).

Aaron has expressed concerns about supported nested trees inside our
tree code. As most recently expressed to me:

 * it makes everyone pay the cost of the feature
 * it makes recursion mandatory
 * it hides what's really going on/it makes the code trickier to debug.
 * It's an all-or-nothing sort of solution.  We need an incremental one.

Now, I think these are all addressable. Specifically:

 *  we want all commands and features to be nested tree capable, so
everyone should be paying the cost - and the cost should be zero when a
tree has no nested children.

 * Its not hard to add 'recursive=True' default parameters to things
   like changes_from and iter_changes, which would make recursion
   optional. Having to use a different class to enable/disable recursion
   is a pretty big hammer.
 * I don't think it hides whats going on any more or less than having a 
   separate class which pulls data out of all the children. If anything,
   having a call stack that shows iter_changes from each tree nesting
   down would be clearer, rather than having one composite that has to
   manually implement that same structure, is less clear.
 * We can address the all-or-nothing issue by making recursion
   controlling parameters default to False initially, with a plan to 
   switch them to True as soon as that doesn't break the test suite.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090505/47d000e2/attachment.pgp