[RFC] proposed user doc for nested trees

Tue May 12 06:47:48 BST 2009

I liked the document a lot, both as something that can eventually
become user documentation, and as a way to clarify the conversation
about what we are going to merge into bzr.

To start off just with a little tone thing someone else mentioned:

> Bazaar has good support for building and managing
> Bazaar is smart enough to ...

There's nothing really wrong with this sentence as it stands, but I
think generally we should avoid making value judgements in the
documentation because it can cross the line into sounding smug, even
if the support actually really is extremely good.  If we say we have
support, and we present it clearly, and the feature actually is great
that should be enough.

One thing we could usefully do at the start is have an overview of the
features as a bullet list, possibly linking through to the place
they're described in detail, and that will give an idea of how
complete the support is.  Also it'll assure that questions like 'what
if I want to nest the latest version of a branch' will be answered.

I realize it's draft documentation for a future feature so I just
mention this because we've had feedback on it before, and this
document already looks so good.

I agree that calling them 'nested trees' would be clearer than 'nested
branches'.

I think this document needs a bit more detail about what's happening
behind the ui, and so does this conversation if it's to move forward
smoothly.  You cover it pretty well in some places by say what is or
isn't stored, but I think for users to really understand this they
need a model of what's recorded in the committed
inventories/revisions, what's in the working tree, and what's in the
branch/es.

The behaviour of 'bzr branch' on containing branches isn't explained
by itself, only in passing in 'Virtual projects'.  From there we can
infer when you branch, it brings across in the committed revisions
references to the other branches, so we know those nested trees exist,
and it seems that their working trees are created.  Are they branched
from the branches in the source, or are they pulled from the same
reference URL they originally came from?  (The second seems
problematic if you're pushing to a server, because in general we don't
assume that the server can go and make outgoing connections on your
behalf.)

What happens if you have a branch with no working tree?  Presumably
the fact that the nested trees were there is still present.  Is the
data copied correctly in this case?  Do they all go into the same
repository?  Will running 'bzr checkout' then reconstruct all the
nested trees? (Perhaps obviously yes.)

One question possibly out of scope for this design: some other systems
(like configmanager?) let you have the top level tree require creation
of ./a and ./a/b without ./a needing to know anything about it.  The
only case I've seen for that is when people have a top-level tree
which just does the assembly and nothing else.  It can probably be
delayed; it may be worth noting as a restriction.

> bzr branch --nested

I think "remember this branch in the parent" is more of an operation
in its own right than someting just done by 'bzr branch'.  For example
you might want to init a new nested tree, or you might already have an
untracked nested branch constructed by some other means.  So why not
have 'bzr join' or 'bzr add --nested' (though the second seems now
discarded?)

I'd like, more for the sake of this discussion though it would also
help users, to see how this would be shown by 'bzr status'.  If it
recurses then it should show you the changes in the nested trees,
obviously, but it also seems to need to show that the version of the
nested tree in the parent is not what's in that branch.

> bzr pull src/lib/sax

Presumably this should be 'pull -d src/lib/sax'.

> Bazaar is smart enough to recurse by default into nested branch, commit changes there, and commit the new nested branch tips in the current branch

I'm really not sure that recursing should be the default behaviour.

The biggest argument for recursing by default, it seems to me, is that
if you do have the nested trees in lockstep with the parent this will
keep them there more easily, and you won't get the situation of both
trees being properly committed but the reference to the subtree not
having been updated.  However, I think we do have to handle this
situation reasonably (through status etc) and someone who's keeping
multiple related trees up to date probably needs to know about it.

On the other hand:

Ideally we would have one consistent rule for all commands as regards
descent into nested trees.  If it's "they all recurse" that's great;
if it's "readonly commads recurse by default" that's not so great; if
it's "... they all recurse except uncommit recurses upwards" I think
people's mental model will be incomplete so they may be surprised.

Also, I think the code changes will be such that commands that aren't
explicitly updated won't recurse; that's probably the only sane
approach.  So that means that code in bzr that's not updated will
default to not recursing, and code in other places (like bzr-gtk or
qbzr) won't recurse either, at first.

I think I'd rather be in the situation where they're all consistent
(and not recursing) but some are lacking a useful option.

And there is the third way sjt alluded to of making some commands like
commit warn or error if there are nested changes.

> Note however that log -v and log -p on he containing branch will show what files in nested branches were changed in each revision.

... presumably you mean in each revision of the containing branch,
which might skip over several nested-tree revisions.

> nested branch locations are not tracked over time

I think we should say here where they are stored.  As non-versioned
data in the branch, like tags?

> bzr nested DIR LOCATION

So this seems to me a lot like the issue of managing the push, pull,
etc default locations and I wonder if it should be unified with them,
meaning that it would work also for the containing tree or a tree with
nothing nested.  (In which case obviously it would need a different
name.)  It seems like for nested branches you want to control the
push, pull location for them too.

This section is actually raising a bit of a conceptual question for
me: are you saying that the nested branches have their own tip
pointer, or that they're really checkouts of branches held somewhere
else, in which case it seems more reasonable there's only one location
for each of them.  (ie they're really just working trees, not
branches.)

If this is true that they're just checkouts then I wonder if this
should just be an option to the checkout command: get a checkout,
under FOO, of this other branch.

> To delete the location of a nested branch: bzr nested --delete DIR

The text seems to imply that does not delete the directory, just
forgets the location of its branch.  Are you then left with a checkout
with no branch, in which you can't do anything much until you
essentially rebind it?

> --no-recurse-nested

It's kind of long, could we use --nested just for controlling
recursion and then something else to say whether to create them or
not?  (It's plausible that you might want 'branch --no-nested' to say
not to create the nested trees in the result...)

> The remove command deletes a nested branch when required like this:
> bzr remove src/lib/ancientDB

Is this the same as 'bzr rm'?  Does it need to be different?

And should this perhaps check that the tree is clean before removing
it?  I guess it leaves any committed data behind in the repository?

(Maybe 'bzr rm' on the root of a branch or checkout should also just
check it's clean and then remove it, as a separate feature?  There was
a thread recently asking how to remove a tree...)

> Commands like commit and push need online access to the locations for nested branches which have updated their tip....  If you are working offline, you may want to ensure your have a local mirror location defined for nested branches you are likely to tweak.

s/your/you

I think the first sentence means by "which have updated their tip"
that if you want to change a branch, such as committing to a nested
branch, you must be able to reach it?

It sounds like 'mirror location' is a special concept but it's not
really explained.

> enforcing one common revision is the right way

I guess you mean just one copy of the library, not revision?  Because
we'd also prohibit having two different related revisions.

I'm most concerned that this will come in when people have related but
distinct branches that share file ids, eg if they both started by
branching from a common template.  Or you might plausibly have
libfoo1.1 and libfoo2.0 that share history.

> If you require this feature

I'd just say feedback or questions about anything are welcome.

-- 
Martin <http://launchpad.net/~mbp/>