[RFC] Should we rewrite nested-trees or our formats or punt?

Thu Mar 26 13:14:32 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> I'm convinced that an extra dict is needed for us, because of our
> assertion that fileids are unique across a forest. But I think that that
> assertion is the problem.
> 
> E.g. 'bzr add' has to be taught to check that the new ids its adding are
> not in any subtree.
> 'bzr commit' with a subtree has to inspect the subtree commit to make
> sure no new fileids are added that are not also deleted in the parent
> tree.
> and so on.
> 
> I think that with that assertion removed, we could handle subtrees when
> we encounter them rather than having to do up-front scans for them.

As I think you know, I think that would make it virtually impossible to
make nested trees user-friendly.  Since they would violate a key rule of
Trees, we would not be able to treat them, internally, as a large Tree.
 Instead, we would have to spend extra effort on each operation to make
it behave as though nested trees were large trees.  So not only would we
need to make operation behave nicely in the nested case, we would need
to make them behave nicely in the un-nested case.

>> Option 3:
>> Lie about subtree support
>>
>> We can land a brisbane-core format that supports subtrees but claims not
>>  to.  Or alternatively, we can have a config option to enable subtree
>> support (i.e. in locations.conf).
> 
> I think this is the closest to the proposed plan from the sprint, that
> is that the disk format does subtrees but the UI doesn't.

However, if as you say, we need an extra dict, then we will never want
to enable subtree support in a variant of brisbane core that lacks it,
right?

>> Option 4:
>> Change our code to match our storage
>>
>> Most of these operations are already walking the tree, so in theory, we
>> can cache the walk and reuse it.  Unfortunately, this would have to be
>> pushed through several layers.  For example, BzrDir.create_workingtree
>> does not accept a Tree or iter_entries input.
>>
>> In my opinion, the resulting code will be less clear than our current
>> code, harder to debug, etc.  It will also delay nested trees.
> 
> I'd like to know why it will be less clear;

I think that starting one commit in the middle of another commit is
confusing, for example.  I think it presents progress reporting
problems, error reporting problems, and debugging problems.

I think it's much clearer to commit to each subtree first, then commit
to the containing tree.

> I would have naively thought
> [given the assertion we can treat forests as 'big trees'] that doing
> everything inline would be precisely the same clarity - iter_changes
> would report changes from children etc.

It will be less efficient to perform a commit using that form of
iter_changes, because iter_changes doesn't indicate which Tree it is
emitting results for.  This means that when we encountered a file in the
subtree, we would have to look up its tree and perform a new commit in
the subtree.

John has proposed instead that WT.iter_changes would always emit tree
references, so that we could run commit in the subtrees.  This means
that iter_changes is emitting things that it has no reason to believe
have changed, and I think it further confuses that API.

I suppose you could have a CommitBuilder that was aware of composite
trees and managed the process relatively seamlessly, but that's a lot of
work and definitely won't be our first cut.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknLf7QACgkQ0F+nu1YWqI1RgQCbBL5YenPDHZusO5pALbc9GjNB
d2wAmwWKG3u9CM6OJ4fMdRp8sCsX4q5K
=K9zB
-----END PGP SIGNATURE-----