[RFC] Should we rewrite nested-trees or our formats or punt?

Thu Mar 26 22:18:43 GMT 2009

On Thu, 2009-03-26 at 09:14 -0400, Aaron Bentley wrote:

> > I think that with that assertion removed, we could handle subtrees when
> > we encounter them rather than having to do up-front scans for them.
> 
> As I think you know, I think that would make it virtually impossible to
> make nested trees user-friendly.  Since they would violate a key rule of
> Trees, we would not be able to treat them, internally, as a large Tree.
>  Instead, we would have to spend extra effort on each operation to make
> it behave as though nested trees were large trees.  So not only would we
> need to make operation behave nicely in the nested case, we would need
> to make them behave nicely in the un-nested case.

I know we disagree on this. I'm fairly sure though that we're the dog
getting wagged by the tail enforcing this constraint. I've always
understood file-ids to be a tool for making the VCS more powerful, but
it really feels like we're going to be delivering a result with
significant caveats here. However you're doing the work, and I want to
help rather than hinder, so take my critique as questions I think you
should consider, not things I want you to do differently.

> >> Option 3:
> >> Lie about subtree support
> >>
> >> We can land a brisbane-core format that supports subtrees but claims not
> >>  to.  Or alternatively, we can have a config option to enable subtree
> >> support (i.e. in locations.conf).
> > 
> > I think this is the closest to the proposed plan from the sprint, that
> > is that the disk format does subtrees but the UI doesn't.
> 
> However, if as you say, we need an extra dict, then we will never want
> to enable subtree support in a variant of brisbane core that lacks it,
> right?

That makes sense to me; if we need Foo to make a feature work well,
having the feature off when Foo is missing is only sensible.

> >> Option 4:
> >> Change our code to match our storage
> >>
> >> Most of these operations are already walking the tree, so in theory, we
> >> can cache the walk and reuse it.  Unfortunately, this would have to be
> >> pushed through several layers.  For example, BzrDir.create_workingtree
> >> does not accept a Tree or iter_entries input.
> >>
> >> In my opinion, the resulting code will be less clear than our current
> >> code, harder to debug, etc.  It will also delay nested trees.
> > 
> > I'd like to know why it will be less clear;
> 
> I think that starting one commit in the middle of another commit is
> confusing, for example.  I think it presents progress reporting
> problems, error reporting problems, and debugging problems.
> 
> I think it's much clearer to commit to each subtree first, then commit
> to the containing tree.

From a user perspective, I don't think it makes much difference. We
prompt for commit messages bottom-up either way.

> > I would have naively thought
> > [given the assertion we can treat forests as 'big trees'] that doing
> > everything inline would be precisely the same clarity - iter_changes
> > would report changes from children etc.
> 
> It will be less efficient to perform a commit using that form of
> iter_changes, because iter_changes doesn't indicate which Tree it is
> emitting results for.  This means that when we encountered a file in the
> subtree, we would have to look up its tree and perform a new commit in
> the subtree.

Broadly we want to run different code in the subtrees to determine 'has
it changed' when we are doing a recursive-down commit from a
non-recursive-down commit, or a recursive status. And tree references
are special; we want to potentially run different commit, or status, or
other code at tree boundaries. Imagine for instance a git tree under a
bzr tree. Have a special iterator for tree references only is a little
odd though, even though they are special, in that we don't have a
directory-only iterator, or a files-only iterator.

I think perhaps a callback called on tree-references which takes the
same signature as iter_changes might work. The default could be to chain
into iter_changes again; commit would pass in a closure to commit
recursively [for 'down'] or to return unchanged [for 'non-recursive'].

> John has proposed instead that WT.iter_changes would always emit tree
> references, so that we could run commit in the subtrees.  This means
> that iter_changes is emitting things that it has no reason to believe
> have changed, and I think it further confuses that API.

I agree; its very special case and would require all code using
iter_changes to know explicitly about nested trees. At best thats
disruptive to our code today.

> I suppose you could have a CommitBuilder that was aware of composite
> trees and managed the process relatively seamlessly, but that's a lot of
> work and definitely won't be our first cut.

We definitely don't want to block on having it perfect.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090327/2d251bef/attachment.pgp