What constitutes the "identity" of a changeset?

Fri Mar 28 20:41:00 GMT 2008

On 28/03/2008, James Westby <jw+debian at jameswestby.net> wrote:
> First, changeset is a loaded term, as a lot of people see it more like
>  a diff, whereas what bzr and mercurial deal in is snapshots, so
>  we usually prefer the term "revision".

Sorry, this goes back to my "confused over terminology" comments
elsewhere. I'll try to keep my terms straight. (And I think I'll start
keeping a glossary, which might end up being useful for the docs).

> A revision is uniquely identified by it's revision id. Two revisions
>  with the same id must be identical, two revisions with different
>  ids are considered to be different.

That's fine. Given 2 revision IDs, I can say whether the revisions are
identical.

>  The revision then ties this together with the tree state, and the
>  revision metadata, including the parents, just as in Mercurial.

"Ties this together" how? In the case of Mercurial, the ID is a hash
of the state/metadata - so I know that if the state/metadata are the
same, the revision ID will be (and vice versa).

>  The difference is the mercurial derives the name from the data,
>  bzr uses an arbitrary name and just associates it with the data.

There's the point, though. If revision identity is encapsulated in the
ID, and the ID is arbitrary, how can I say if 2 revisions are
identical. In reality, the ID *isn't* arbitrary, precisely because you
can reason sanely about revision identity, but the rules aren't
written up anywhere.

Let me give a concrete example, from a discussion that came up on the
Mercurial list.

Take a branch, with a simple revision tree a -> b -> c -> d. Now, I
want to modify revision c in the history (yes, this isn't possible as
such, bear with me). Suppose I roll back to b, then reapply c with
changes. The new revision *is not c*, precisely because of the changes
- call it c'. Reapplying d gives a *new* revision d' - precisely
because revision identity incorporates the parent, and the parent of
d' is c' where the parent of d is c (assume there is no other change
in d/d').

Now, if someone cloned my branch before I did this, they would have a
-> b -> c -> d. If they pull from me, they get

a -> b -> c -> d
       \> c' -> d'

and have to merge.

This isn't complicated logic, but the point is that I can reason like
this, precisely because I know what affects the revision ID (and hence
revision identity).

What I'm asking for is an explanation of how Bazaar handles revision
IDs, so that I can make deductions like this.

It's not clear to me, for example, how bzr-svn would assign revision
IDs for changes in (a) a remote Subversion repository and (b) a
svncloned local mirror of the same repository. If bzr-svn can know
that these 2 are "the same", then it can assign the same revision ID
to subversion revision NNN in each. And that means that I can bzr
branch from the local repository (for speed of the initial conversion)
and then change to bzr pull-ing from the remote repository.
(Experimentally, it appears that bzr-svn might not be able to match up
like that).

Does this help explain what I'm trying to do?

Paul.