nested trees design-approach : composite trees vs iter_changes
Aaron Bentley
aaron at aaronbentley.com
Tue May 5 15:19:32 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> I've been concerned about CompositeTree as an approach for a while, but
> we haven't been able to actually get anywhere in discussing it; today
> Aaron asked broadly that I either veto, or stop complaining. I hadn't
> wanted to veto the vote because that seems to block people vs trying to
> change their mind;
If I change my mind now, I will have wasted weeks of effort. It would
have been far more productive to block me. Even if we made no progress
with nested trees, I could have worked on something else.
> however Aaron feels that this will force some
> discussion to resolve it.
I've heard your objections. They don't resonate with me. You haven't
vetoed, so I believed that you would let me get on with it, even though
it's not your preferred approach.
Instead, you have been undermining me. I don't like being vetoed, but
it is better than being undermined. It is at least direct. It
demonstrates that your objections are strong enough that you're willing
to derail the current development over them, and that's important data.
.. so I've rejected in BB, and this mail is
> hopefully sufficient to get discussion going.
>
> CompositeTree worries me for a number of reasons.
>
> * Our behaviour will change in unexpected ways depending on whether
> a CompositeTree is being used.
This seems entirely spurious to me. When you get exactly what you ask
for, it cannot be unexpected behaviour. CompositeTree is a shim for
supporting certain commands which have a shallow view of Trees, and
therefore don't need to be aware of tree references. Commands like
diff, status, ls, annotate, export.
It is an attempt to get coverage of a bunch of commands with minimal
cost. It is a stepping stone, not a long-term solution.
> For instance 'tree.add' with a
> specific fileid will error on duplicates when a CompositeTree is
> used, and not (with the same tree, same fileid) if the programmer
> forgets to use CompositeTree (CT from here on in).
CT.add does not exist and will probably never exist.
> * Because of that we're going to end up using CT everywhere, always.
No, that would be a disaster. Many commands need to know where the
subtrees are, and CT masks this.
> * Because CT is separate to the tree objects, we appear to need a new
> index of nested trees for performance, which is redundant data. Its
> not necessarily wrong to have it; but having to have it to have the
> feature work at all is rather concerning.
As you yourself note, it's for performance. The code works perfectly
without such an index. For small trees, it probably doesn't need an index.
I believe that this index is fundamentally necessary. When performing
an operation that recurses into subtrees, we should lock those subtrees
before beginning the operation. Operations should not fail due to lock
errors when partially complete.
The need for an index is not exclusive to CompositeTrees.
Branch.create_checkout, BzrDir.sprout, Merger._do_merge_to, and
WorkingTree.revert all use it. So this is not an objection to
CompositeTree-- it's an objection to my entire implementation strategy.
> * Because trees won't be responsible for their own nested items, it
> may/will be very tricky for bzr-svn to do the right thing with
> externals [it may need to represent them as something different
> again, though I'm not 100% sure about that].
I don't understand this. I've read the Subversion externals
documentation and consulted Karl Fogel. There are model compatibility
issues, but I know what they are. Jelmer also hasn't raised anything.
Can you explain the kind of problem you forsee?
> There is another concern, which is more weakly coupled to CT itself. I
> think that CT makes this harder to change in the future
I don't think so. CT is meant to ease the transition into a future
where nesting is a core feature. When we have achieved that, we won't
need CT anymore.
> which is why I
> mention it now:
>
> * The CT design enforces 'file ids are unique' across all the nested
> trees;
This is an objection to my nested trees design, not to CT. We already
have plenty of code in bzr core which assumes that nested trees have
unique file-ids across trees. Indeed, I implemented unique root ids to
support this. I don't think this is legitimate grounds for objecting to
CT, because if I were to discard CT and rewrite the code, it would still
contain that constraint.
> this perhaps not meant to be a long term constraint
It is meant to last until path tokens are implemented. I have always
described nested trees as functioning like a single big tree. When path
tokens permit multiple copies of a given tree to coexist in a by-value
nested tree, they will also permit it in a by-reference tree. We don't
need two ways of achieving the same thing.
>, but it
> is one that we can't enforce - unlike our normal behaviour users
> will be able to make bzr break, and it won't be clear why, or how
> they should fix it (and arguably they won't have done anything
> wrong that would need fixing).
I think that we can write our UI such that users understand why it's
broken and how to fix it.
> Aaron has expressed concerns about supported nested trees inside our
> tree code.
Your approach is not the only alternative to CompositeTrees. Another
approach is explicit high-level recursion, as I've done with revert,
merge and commit.
> * we want all commands and features to be nested tree capable, so
> everyone should be paying the cost - and the cost should be zero when a
> tree has no nested children.
That would be nice, but how do we get there in small steps? You're
already objecting to the indexes that make it cheap to determine whether
a tree has nested children.
> * Its not hard to add 'recursive=True' default parameters to things
> like changes_from and iter_changes
No, what's hard is implementing the recursive behavior.
> , which would make recursion
> optional. Having to use a different class to enable/disable recursion
> is a pretty big hammer.
If you change iter_changes so that it automatically recurses, you're
making a deep change to its behaviour. You need to expose tree
references, so that operations which care about tree references will be
able to show them. It might as well be a new API.
Having another class that masks tree references means you don't have to
rewrite 'diff' and 'status' on top of everything else.
> * I don't think it hides whats going on any more or less than having a
> separate class which pulls data out of all the children.
I usually find your code to be very, very unclear, so I do not place
much stock in your opinion here.
> * We can address the all-or-nothing issue by making recursion
> controlling parameters default to False initially
Perhaps I expressed myself poorly. I believe that it makes sense to
start with relatively uninvasive changes. They are faster to make and
bear less risk of causing problems for others. They allow us to
experiment with approaches to nesting without committing fully to them
by enshrining them in our public API.
I don't think that shoving recursion into Tree is a desirable outcome.
Many operations will need to know that they are dealing with multiple
trees. I think we will wind up using a new API that works naturally in
situations where there are nested subtrees. I think NestedTrees is the
start of such a thing. Once we've got more recursion-aware code, we'll
be able to refactor it.
We will also need a TreeTransform that can apply to all subtrees, so
that merge and revert function perfectly. It will need to handle cases
where subtrees are created and destroyed, and where files are moved
among subtrees.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkoASvAACgkQ0F+nu1YWqI0rswCeM5glpSx9DhhWMLrUCKZPLPHk
bn0An3L51jiAy5maUzHFFZE92VJKb78K
=q37y
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list