nested trees design-approach : composite trees vs iter_changes

Wed May 6 14:48:38 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Tue, 2009-05-05 at 10:19 -0400, Aaron Bentley wrote:
> 
>> I've heard your objections.  They don't resonate with me.  You haven't
>> vetoed, so I believed that you would let me get on with it, even though
>> it's not your preferred approach.
>>
>> Instead, you have been undermining me.  I don't like being vetoed, but
>> it is better than being undermined.  It is at least direct.  It
>> demonstrates that your objections are strong enough that you're willing
>> to derail the current development over them, and that's important data.
> 
> Concerns that aren't addressed rarely just go away;

True, and I still think that the pack repository has way too many
layers.  I still think it's a bad idea for the working tree's
last_revision to be disjoint from the branch's last revision.  Decisions
get made and we move on.

> trying to get it
> discussed wasn't intended to be undermining, but an attempt to get it
> discussed!

This is way too late for that discussion.  I was asked to resurrect my
nested trees code, get it merged, and polish it up.  That's what I'm
trying to do, and I think I've been very clear about that, e.g. at our
last sprint.

>> It is an attempt to get coverage of a bunch of commands with minimal
>> cost.  It is a stepping stone, not a long-term solution.
> 
> What is the long term solution? http://bazaar-vcs.org/NestedTreesDesign
> doesn't talk about other options.

IMO, the long-term solution is for us to implement the features we need,
discover the API those require, and refactor them into something coherent.

Probably for diff and status, we'll need to implement a recursive
variant of iter_changes, but one that makes subtree boundaries clear, so
that they can be properly reported on.

>>>  * Because of that we're going to end up using CT everywhere, always.
>> No, that would be a disaster.  Many commands need to know where the
>> subtrees are, and CT masks this.
> 
> What commands don't?

See my previous email.  I specifically called out diff, status, ls,
annotate, export.  We also might include log in that list, at least
inasmuch as log provides status-style and diff-style output.

>> I believe that this index is fundamentally necessary.  When performing
>> an operation that recurses into subtrees, we should lock those subtrees
>> before beginning the operation.

> Operations can run into locking problems at any point - see for instance
> files open in a text editor on win32.

At least that's not one of our locks causing a locking problem.  I think
failing to lock subtrees in advance would introduce new failure modes.
Our old "diff and commit" bug becomes worse if the diff command is run
in a subtree and the commit is run in the containing tree, because some
subtrees might be committed, but others would fail.

> I think its nice to avoid that
> where we can, but as its something that can't be avoided completely, we
> should consider carefully the cost of avoiding it.

The cost of avoiding it is very small, IMO.  Do you disagree?

> And note that
> repositories do their physical locking very late in tasks these days -
> and that has worked well, with less resource contention on shared
> repositories.

I'm sure you'd agree that working trees are used very differently from
repositories.  We don't currently have a plan to allow the same working
tree to be updated by bzr processes at the same time.  I'm not sure what
we'd use such functionality for.

>> The need for an index is not exclusive to CompositeTrees.
>> Branch.create_checkout, BzrDir.sprout, Merger._do_merge_to, and
>> WorkingTree.revert all use it.  So this is not an objection to
>> CompositeTree-- it's an objection to my entire implementation strategy.
> 
> We have a lot of code which was designed or written with O(things) years
> ago - and we've been updating our designs and concepts to avoid this
> where possible, because its costly. nested trees has been lying fallow,
> and hasn't been updated in this regard. I think its its entirely healthy
> to look at it and ask the question.

I don't.  It has been a killer having it bitrot, and then getting it
back up to date.  I don't want more of that, which is what more
discussion and examination would entail.

>>>  *  we want all commands and features to be nested tree capable, so
>>> everyone should be paying the cost - and the cost should be zero when a
>>> tree has no nested children.
>> That would be nice, but how do we get there in small steps?  You're
>> already objecting to the indexes that make it cheap to determine whether
>> a tree has nested children.
> 
> I'm objecting to the algorithms that require knowing that up-front.

I know, but I feel like it is the best answer, because it allows us to
get there in small steps, to make less-invasive changes, etc.

>> Having another class that masks tree references means you don't have to
>> rewrite 'diff' and 'status' on top of everything else.
> 
> As a transition strategy, sure. It was not at *all* clear that CT was a
> transition strategy.

Perhaps I could have been clearer about that.  I did say that
NestedTrees would become the preferred interface.

>>>  * I don't think it hides whats going on any more or less than having a 
>>>    separate class which pulls data out of all the children.
>> I usually find your code to be very, very unclear, so I do not place
>> much stock in your opinion here.
> 
> Bringing a technical discussion down to ad hominems is rarely helpful.
> At best it leaves people feeling defensive or offended.

Your statement wasn't a technical one.  It was a statement about what
you consider to be clear or unclear.  There are no absolutes in this
area, only intuition, so when you argue from the perspective of your
intuition, I think it's reasonable to show that there are flaws in your
intuition.

>>>  * We can address the all-or-nothing issue by making recursion
>>>    controlling parameters default to False initially
>> Perhaps I expressed myself poorly.  I believe that it makes sense to
>> start with relatively uninvasive changes.  They are faster to make and
>> bear less risk of causing problems for others.  They allow us to
>> experiment with approaches to nesting without committing fully to them
>> by enshrining them in our public API.
> 
> Regardless of whether we put thing in our public API, people will start
> using and depending on them as soon as the feature is vaguely usable in
> bzr.dev.

Or even earlier :-/

> I agree that uninvasive changes are easier to mutate. The
> problem that that has is that if an invasive change is whats actually
> needed to get a good outcome, the absence of that change can and does
> skew the code written until its made.

So why isn't the outcome I've proposed sufficiently good?

>> We will also need a TreeTransform that can apply to all subtrees, so
>> that merge and revert function perfectly.  It will need to handle cases
>> where subtrees are created and destroyed, and where files are moved
>> among subtrees.
> 
> I think most operations will want to recurse one way or another; you say
> that Tree operations shouldn't know how to recurse [yet some already
> do], and that CT is a transitional too.

I think high-level operations should recurse.  I think low-level ones
should not.  I realize this is hand-waving.  Perhaps some examples?

commit -> high-level
id2path -> low level
revert -> high-level
apply_inventory_delta -> low-level

> The only reference I see in
> bzr.dev to NestedTrees is in
> doc/en/user-guide/shared_repository_layouts.txt, which only links to the
> wiki.

It's in the patch you vetoed.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoBlTIACgkQ0F+nu1YWqI0s1wCdFT7hcm/y3yVKtEIVx9caXPJW
vH0An07GEX41FD69I73bjhUIIKWKqacB
=Zvf4
-----END PGP SIGNATURE-----