[RFC] Should we rewrite nested-trees or our formats or punt?

Wed Mar 25 23:40:43 GMT 2009

On Wed, 2009-03-25 at 17:12 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi all,
> 
> The current state of play is:
> - - brisbane-core will land in a single version that supports nested trees
> - - when a repository supports nested trees, some slower code paths are
> activated that handle nested trees.
> - - in particular, code paths use Tree.iter_references to determine which
> subtrees need to be examined, which must walk the tree.
> 
> This means that if we land brisbane-core today, it will be slower than
> it should be.

This is indeed a problem. At a meta-level, comparing to git and hg:
 - AFAIK git does what we do today, in that it has just inline node type
   that is 'subtree'.
 - hg has a top level list of all the trees [a 'forest'].

I'm convinced that an extra dict is needed for us, because of our
assertion that fileids are unique across a forest. But I think that that
assertion is the problem.

E.g. 'bzr add' has to be taught to check that the new ids its adding are
not in any subtree.
'bzr commit' with a subtree has to inspect the subtree commit to make
sure no new fileids are added that are not also deleted in the parent
tree.
and so on.

I think that with that assertion removed, we could handle subtrees when
we encounter them rather than having to do up-front scans for them.

Operations like fetch and branch could yield/return in some fashion
subtree references that are encountered for other layers to deal with.

> Option 1:
> Store the data harmoniously with our code paths.
> This feels too late in the release process.

I agree.

> Option 2:
> Use a non-subtree variant of brisbane-core
> 
> This is simply a stall.
> This means we keep carting around multiple variants, but at least one
> will be hidden.

On the plus side we don't have any change in behaviour of a disk format
when we do move to subtrees being supports. On the down side people have
to run an upgrader rather than it just working one day.

> Option 3:
> Lie about subtree support
> 
> We can land a brisbane-core format that supports subtrees but claims not
>  to.  Or alternatively, we can have a config option to enable subtree
> support (i.e. in locations.conf).

I think this is the closest to the proposed plan from the sprint, that
is that the disk format does subtrees but the UI doesn't.

> Option 4:
> Change our code to match our storage
> 
> Most of these operations are already walking the tree, so in theory, we
> can cache the walk and reuse it.  Unfortunately, this would have to be
> pushed through several layers.  For example, BzrDir.create_workingtree
> does not accept a Tree or iter_entries input.
> 
> In my opinion, the resulting code will be less clear than our current
> code, harder to debug, etc.  It will also delay nested trees.

I'd like to know why it will be less clear; I would have naively thought
[given the assertion we can treat forests as 'big trees'] that doing
everything inline would be precisely the same clarity - iter_changes
would report changes from children etc.

-Rb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090326/575a7ad4/attachment.pgp