[RFC] Should we rewrite nested-trees or our formats or punt?
Robert Collins
robert.collins at canonical.com
Wed Mar 25 23:40:43 GMT 2009
On Wed, 2009-03-25 at 17:12 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi all,
>
> The current state of play is:
> - - brisbane-core will land in a single version that supports nested trees
> - - when a repository supports nested trees, some slower code paths are
> activated that handle nested trees.
> - - in particular, code paths use Tree.iter_references to determine which
> subtrees need to be examined, which must walk the tree.
>
> This means that if we land brisbane-core today, it will be slower than
> it should be.
This is indeed a problem. At a meta-level, comparing to git and hg:
- AFAIK git does what we do today, in that it has just inline node type
that is 'subtree'.
- hg has a top level list of all the trees [a 'forest'].
I'm convinced that an extra dict is needed for us, because of our
assertion that fileids are unique across a forest. But I think that that
assertion is the problem.
E.g. 'bzr add' has to be taught to check that the new ids its adding are
not in any subtree.
'bzr commit' with a subtree has to inspect the subtree commit to make
sure no new fileids are added that are not also deleted in the parent
tree.
and so on.
I think that with that assertion removed, we could handle subtrees when
we encounter them rather than having to do up-front scans for them.
Operations like fetch and branch could yield/return in some fashion
subtree references that are encountered for other layers to deal with.
> Option 1:
> Store the data harmoniously with our code paths.
> This feels too late in the release process.
I agree.
> Option 2:
> Use a non-subtree variant of brisbane-core
>
> This is simply a stall.
> This means we keep carting around multiple variants, but at least one
> will be hidden.
On the plus side we don't have any change in behaviour of a disk format
when we do move to subtrees being supports. On the down side people have
to run an upgrader rather than it just working one day.
> Option 3:
> Lie about subtree support
>
> We can land a brisbane-core format that supports subtrees but claims not
> to. Or alternatively, we can have a config option to enable subtree
> support (i.e. in locations.conf).
I think this is the closest to the proposed plan from the sprint, that
is that the disk format does subtrees but the UI doesn't.
> Option 4:
> Change our code to match our storage
>
> Most of these operations are already walking the tree, so in theory, we
> can cache the walk and reuse it. Unfortunately, this would have to be
> pushed through several layers. For example, BzrDir.create_workingtree
> does not accept a Tree or iter_entries input.
>
> In my opinion, the resulting code will be less clear than our current
> code, harder to debug, etc. It will also delay nested trees.
I'd like to know why it will be less clear; I would have naively thought
[given the assertion we can treat forests as 'big trees'] that doing
everything inline would be precisely the same clarity - iter_changes
would report changes from children etc.
-Rb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090326/575a7ad4/attachment.pgp
More information about the bazaar
mailing list