presenting the fundamental abstractions

Wed Sep 12 18:34:38 BST 2007

On (12/09/07 09:03), Andrew King wrote:
> To further my own understanding, and to help put off a few of those
> similar questions I have been asked by my colleagues, I thought I
> would try and write out a bit of an explanation of bzr concepts. In so
> doing, I came up with some questions as well. Anyway, I thought this
> might be a good way of fixing my misconceptions, or starting a
> concepts section on the bzr wiki. (As it is written by someone who
> isn't a bzr developer, hopefully I may have more idea of what people
> don't "get" (including me!).

Thanks for doing this. I'm sure this will help a lot of people, and it
would be good to work this in to a document on the wiki as you say.

Matthew has already made some good points, I would like to add a few
more clarifications if I may. Some of what I say is probably too in
depth for user documentation, but if we all have a good understanding of
what is going on then we will write better documentation, and so one
reason I am writing it is to pick up any holes in my understanding.

> 2. Branch -
> 
> A branch is a set of revisions.

Yes, it is. However it may be better to think of it as a pointer to a
single revision, which is the head revision of the branch. Thanks to the
parent relationships of revisions this one pointer uniquely defines a
DAG (directed acyclic graph) of revisions that form the set of revisions
in this branch.

I prefer the pointer model, as it makes some operations in to just
moving the pointer, and it is how branches are now represented on disk
(.bzr/branch/ is pretty simple, and the only required bit is the pointer
to the revision in the associated repository).

> A revision exists in one or more
> branches. A branch itself is not editable, it is modified by means of
> a working tree (commit etc.) (see later), or via addition or deletion
> of revisions from another branch.

This is correct. "deletion of revisions from another branch" is a little
wrong, as you don't need another branch to delete the revisions, you can
just drop them, but the intent is there.

Defining the revisions as coming from the branch is also a little wrong,
in that they come from a repository, but you usually reference them by
using another branch.

Another issue that comes up here is "master" branches, i.e. checkouts.
They don't change the nature of the object, and this may not be the
place to explain them, but they are definitely a branch related thing.

> 
> 3. Working Tree -
> 
> A working tree is a set of files and directories. It is connected to
> exactly one branch, and is in fact, the set of files that would be
> produced if you applied all the revisions in that branch in order.

As Matthew said it is actually just the files in the last revision.

However a working tree doesn't normally stay in that state, you edit
files to create a new revision. A working tree has a so-called basis
tree that is the revision tree of the head of the branch when the
working tree was started (i.e. at checkout/branch time, or last-full
tree commit).

Another interesting case is revert. If I do revert -r-2 then I have a
working tree that represents the non-head revision of the branch, but
the basis tree is still the revision tree of the head.

The difference here is I guess that working trees are mutable in weird
ways. Branches merely move their pointer (or change the set of
revisions), revisions are immutable, and repositories accumulate
revisions.

> 
> 4. Repository -
> 
> A repository is a collection of revisions. It is effectively a tool
> for caching revisions for sharing between branches. You cannot get the
> current state of a repository. A repository cannot have a working
> tree. A repository could also be named a "revision cache". If you try
> and create a branch in a directory that does not have a repository as
> a parent, it will automatically create one.

If you go with the pointer definition of a branch then the repository
definition becomes easier in my eyes. It becomes just a store of
revisions. 

> bzr init
>    - branch
>    - working tree
> This creates a new branch at the specified location, with no revisions
> in it. This also implicitly creates a working tree with no files in
> it, since you can start creating files and add them.

It will also make a repository if there isn't one to use already. It
creates a branch which is a pointer to the null revision, and a working
tree with the null revision tree as it's basis tree, and hence no files.

> 
> bzr branch
>    - revisions
>    - branch
>    - working tree
> 
> This "copies" a branch from another location. This means it creates a
> new branch with all the revisions from the specified branch location.
> If the branch currently has no repository, it creates a working tree
> with the latest version of the files from that branch. If it has a
> repository, it uses the repositories default setting for working
> trees, ie. create a working tree, or don't create a working tree. This
> is slightly different to copying all the files manually (ie. with a
> file system command) because it verifies all the files etc. are
> correct.

It will also create a repository as necessary. It then copies all
revisions transitively referenced by the revision the other branch
points to (or the specified revision) in to this branch's repository,
and creates a new branch with that revision as it's pointer. It then
creates a working tree as you specify.

> 
> # why does the repository determine whether or not there is a working
> tree? How do you do this with the branch command? Why does a branch
> share the "has a working tree" or "doesn't have a working tree" aspect
> of its repository (if it has one?).

This was for shared repositories on servers and things. Perhaps there
should be a flag to branch to create a working tree regardless of the
setting.

> bzr status
>    - revisions
>    - branch
>    - working tree
> This shows all the files that differ from the current branch (ie. that
> differ from the current state of files if you were to apply all the
> revisions in the branch in order). If you were to "commit", all these
> changes would be added as the next revision to the branch.

It shows the differences between the working tree and it's basis
tree, which will be the head of the associated branch. If you were to do
a full tree commit the new revision would contain the basis tree with
these changes applied as its revision tree.

> 
> bzr commit
>    - revisions
>    - branch
>    - working tree
> This adds a revision to the current branch with all the changes that
> have been made to the working treeas compared to the current set of
> revisions in the branch.

This adds a revision to the repository consisting of the working tree's
basis tree (the current head of the branch), with the files specified
in the command line replaced by the working tree versions (default is
all files to just take the revision tree). The branch is updated to
point at the new revision, and the working tree's basis tree is changed
to the tree that was created.

All of this is done on any master branch first.

> 
> bzr revert
>    - revisions
>    - branch
>    - working tree
> This resets the working tree to the latest version of the branch.

Or a specified revision.

> 
> bzr pull
>    - revisions
>    - branch
> This is a way of adding revisions that are present in a foreign branch
> to the current branch.
> This adds all the revisions in the FROM branch to the TO branch that
> are not already in the TO branch. If the TO branch has additional
> revisions, this command fails (it will not do anything and print out a
> warning). Depending on the "transport" (see later), this may or may
> not update the working tree. (ie. the files the user sees may get
> updated or may not).

This moves the branch pointer to the specified revision (default is the
head of the other branch). In doing so it transitively adds all parents
of that revision to its repository if they are not already present. The
working tree's basis tree is set to the new head and all the files are
set to the state in this basis tree (if present).

The transport shouldn't have an effect in pull, as the branch is always
local (-d sftp://... will throw an error about remote working trees).

> 
> bzr push
>    - revisions
>    - branch
> This is a way of adding revisions that are present in the current
> branch to a foreign branch.
> This adds all the revisions in the current branch to the foreign
> branch that are not already in the foreign branch. If the foreign
> branch has additional revisions, this command fails (it will not do
> anything and print out a warning). Depending on the "transport" (see
> later), this may or may not update the working tree. (ie. the files
> the user sees may get updated or may not). If it can't update the
> working tree, it will print a warning. It is then necessary for
> someone to use the "bzr update" command on that branch/working tree.
> If no foreign branch is specified, it will create the branch at the
> foreign location.

This works similar to pull above. However you mention the fact that the
branch must be a subset (which applies to pull as well). The check is
actually that you can reach the head revision of the TO branch by
following the parent links from the head of this branch. This is a
subset relation (assuming that revision ids are always unique), as once
that condition is satisfied both branches will have exactly the same
revisions in those two DAGs (taking just the DAG starting at the
revision you find in the FROM branch).

> 
> # Why does checkout do 2 seemingly very different things? This seems
> to cause a lot of confusion. Am I misunderstanding the components
> here?

Indeed this surprised me at first.

You could see it as "create a working tree for this branch, and if the
branch is not local then get me a branch as well, unless I specify
--lightweight".

There has been talk of a "create-tree" command which should be slightly
more discoverable for adding a tree to a branch.

> # Why can't annotate work on a remote branch?

It should be able to I believe. This may well be a bug. There may be a
reason I am missing though, otherwise it should be quite easy to fix.

> # Other questions?
> # How do I create a repository after I have already created a branch?

You have a repository in the same .bzr as the branch. If you mean shared
repository you can just touch one file in there. If you want a shared
repository in the parent directory you can pull it out and make it
shared. There is no command to do this however. Again it has been
discussed to add a command to promote a branch to shared repo and
branch.

Thanks,

James

-- 
  James Westby   --    GPG Key ID: B577FE13    --     http://jameswestby.net/
  seccure key - (3+)k7|M*edCX/.A:n*N!>|&7U.L#9E)Tu)T0>AM - secp256r1/nistp256