Alternative dotted revision numbering

Mon Jan 7 22:47:28 GMT 2008

I've been thinking about our current dotted revision number scheme. It
has some nice properties, and some negatives. The important properties
are:

a) Stable between repositories. It doesn't matter when you found out
   about revisions, you always number the same when viewed from the same
   tip.

b) Future proof. By deciding the number based on when it was merged,
   rather than when it was created, a future merge cannot disrupt the
   numbering. (In the future you can branch from an old revision, but
   you can't sneak in a merge above the point you are currently at.)

c) We only have to evaluate each node once. This means that generating
   the numbering is O(N) rather than O(N log N) or O(N^2).

The bad properties are:

a) It generally requires accessing the whole ancestry. It is probably
   possible to compute a subset if you know that it is "closed". But
   that is still left as an exercise for the reader.

b) If you have 2 long-lived branches, and short-lived feature branches
   will be created from either side, your dotted revnos grow without
   bound.  On one project I've worked with, this led to dotted revnos
   >150 characters long.  Which is longer than the revision-ids they are
   supposed to simplify.

c) (optional) The numbers you see show you when the branch was created,
   but don't give you a feeling for when it was merged. Often I find the
   latter more important (figuring out when a feature was merged into
   mainline.)

Here is a graph which exposes the problem with our current numbering
scheme:

   A
   |\
   B C-,
   | |\ \
   D E F M
   |/ /| |
   G H I N
   |/ /| |
   J K : O
   |/   /
   L ,-'
   |/
   P

Consider ACFI as the alternative long-lived branch, where EHK are all
features that were split off and then merged into trunk. Every time this
happens you get a .1.1 attached onto the end of the dotted revno, and it
just keeps getting worse. (In bzr.dev you can see this with Aaron's
integration branch, and a bit with some of Andrew's changes.)

A) Current

   A1
    |\
   B2 C1.1.1 --------------------.
    | |      \                    \
   D3 E1.1.2  F1.1.1.1.1          M1.1.1.2.1
    |/        /   \                |
   G4 H1.1.1.1.2  I1.1.1.1.1.1.1  N1.1.1.2.2
    |/            /                |
   J5 K1.1.1.1.1.1.2              O1.1.1.2.3
    |/                            /
   L6 ,--------------------------'
    |/
   P7

With only 3 merges, we are already at 7 digit groups.

B) Simple counter from when branch is merged

   A1
    |\
   B2 C1.1.1 ---------.
    | |      \         \
   D3 E1.1.2  F1.2.2    M1.4.2
    |/       /  |       |
   G4 H1.2.3  I1.3.3    N1.4.3
    |/       /          |
   J5 K1.3.4            O1.5.4
    |/                 /
   L6 ,---------------'
    |/
   P7

This is probably my favorite. Rather than tacking on '.1.1' when we get
a new branch, we just increment the "branch counter". So the dotted
revisions become

 BASE_REVNO . BRANCH_COUNTER . DISTANCE_TO_BASE

C) A simple update to the above is to make the last digit the revno if
that revision where the branch tip. That gives:

   A1
    |\
   B2 C1.1.2 -------.
    | |      \       \
   D3 E1.1.3  F1.2.3  M1.4.3
    |/       /  |     |
   G4 H1.2.4  I1.3.4  N1.4.4
    |/       /        |
   J5 K1.3.5          O1.5.5
    |/               /
   L6 ,-------------'
    |/
   P7

It means that in bzr.dev you would see numbers like 3162.4.3158 rather
than 3162.4.12. I kind of like the small numbers (B), because it gives
you the "this is the 12th commit on the branch" feeling. But once you
have branches of branches, it starts to lose meaning.
Another alternative is to reset the counter any time you get a new
branch, which is also reasonable:

D)
   A1
    |\
   B2 C1.1.1 -------.
    | |      \       \
   D3 E1.1.2  F1.2.1  M1.4.1
    |/       /  |     |
   G4 H1.2.2  I1.3.1  N1.4.2
    |/       /        |
   J5 K1.3.2          O1.5.3
    |/               /
   L6 ,-------------'
    |/
   P7

E) Reverse numbering from destination

   A1
    |\
   B2 C4.1.1 -----.
    | |     \      \
   D3 E4.1.2 F5.1.2 M7.1.2
    |/      /|      |
   G4 H5.1.3 I6.1.3 N7.1.3
    |/      /       |
   J5 K6.1.4        O7.1.4
    |/             /
   L6 ,-----------'
    |/
   P7

This picks the first digit from the merge revision, the second digit
based on what parent it is, and the third based on its distance to the
original revision. Obviously you have the same options for the third
digit (distance to origin, distance from base, distance from branch).

These are all 'X.1.Y' because I don't have any >2 parent merges. Which
is fairly common. We might consider using "X.Y" for it, and only using
"X.Z.Y" when there are more than 2 parents. (Sort of an omitted .0.)

F) Count-down reverse numbering

   A1
    |\
   B2 C4.1.2 -----.
    | |     \      \
   D3 E4.1.1 F5.1.2 M7.1.3
    |/      /|      |
   G4 H5.1.1 I6.1.2 N7.1.2
    |/      /       |
   J5 K6.1.1        O7.1.1
    |/             /
   L6 ,-----------'
    |/
   P7

Like (E) only counting backwards in the merge revisions.
At first glance this one seems fairly straightforward to number without
looking at all of history, but because of overlap, you still have to go
back until you hit mainline again. (Otherwise C could be 4-7 depending
on which branch got there first.)

So my favorite is one of B,C,C',D. I would be willing to implement one
so that people could experiment with it. I think any of them would be an
improvement over the current scheme, and should have approximately the
same complexity to compute them.

John
=:->