[RFC] Changing dotted revno numbering (was Re: Echoing a post: bzr vs. git)
Vincent Ladeuil
v.ladeuil+lp at free.fr
Thu Nov 6 09:36:36 GMT 2008
>>>>> "fullermd" == Matthew D Fuller <fullermd at over-yonder.net> writes:
fullermd> On Fri, Oct 31, 2008 at 06:00:54PM +0900 I heard the voice of
fullermd> David Cournapeau, and lo! it spake thus:
>>
>> (the example of commit 299.1.4 being after 300 in one branch).
fullermd> At the time the dotted revnos came in, I preferred
fullermd> the idea of numbering them after the merge rev when
fullermd> they came in, rather than the mainline rev in their
fullermd> ancestry. Still do, actually; it has a number of
fullermd> properties I think are more useful (and requires
fullermd> less history walking to enumerate, as a bonus).
Glad to hear that since this is an idea I'm thinking about[1] :)
Indeed it will help help a lot to reduce the history walking,
dramatically for some cases (most ?).
One fear I have is that changing the numbering may encounter
resistance from users, so since you raised the subject, here are
some thoughts I'd like to discuss:
The actual numbering has one fundamental property we must keep:
it's stable in the branch context (except when ghosts are
involved, but little can be done against that, when ghosts
incarnate, some things have to change anyway :)
Roughly speaking revnos are calculated this way: there are two
kinds of revnos, simple ones and dotted ones.
Simple revnos are for mainline revisions, start at one and are
incremented by 1. To simplify the discussion below I called them
trunk revnos.
Dotted revnos are numbered as (MAINREVNO, BRANCHNUM,
BRANCHREVNO).
MAINREVNO is where the branching occurred from the trunk.
BRANCHNUM is defined when you know the complete subgraph from
branching point to the merging point.
The first branch which is merged back to trunk receives 1 as its
BRANCHNUM, second gets 2, etc.
When a merge involved several branches, they are numbered in the
order they were merged.
BRANCHREVNO starts at one for the first revision created after
branching and get incremented by 1.
Now, we can tweak the dotted revnos by saying: MAINREVNO is where
the merging occurred into the trunk.
We can also reverse BRANCHREVNO as: BRANCHREVNO starts at one for
the last revision created before merging and get incremented by 1.
But this is quite counter-intuitive and will not play well to
sort the dotted revnos (feedback ?).
On the other hand keeping the actual definition mean that we will
still need to walk the complete subgraph to be able to start at 1
at the merging point.
Or we could magically find the a starting number that will reach
1 at the branching point (more on this later).
Another point is that if a branch is repeatedly merged, the
dotted revnos related to that branch will get different
MAINREVNOs instead of a single one.
Obviously we miss some additional info for a revision to change
that numbering scheme *and* walking less history (I'd argue that
there is no point in changing the numbering if we end up with the
same O(history) bad property).
A strong constraint on that additional info is that it should be
branch neutral since revisions are shared between branches (so we
can't just cache the dotted revno however stable it may be since
it's a branch property and we get rid of that cache for
performance reasons).
There is however a property that is branch neutral yet gives
useful info in that case: the Greatest Distance From Origin
(gdfo). This is the longest path length in the ancestry graph
from a revision to the NULL_REVISION (the mother (or father) of
all the revisions).
It can be calculated on commit by taking the highest value
between parents and adding 1.
It provides a partial ordering of the revisions that may solve
some of the horizon problems (i.e. by knowing that we don't need
to continue walking some subgraph parts).
It still requires to walk the graph in parallel for the different
paths but it helps to decided whether some path will reach the
trunk sooner that an other (when a path reaches the trunk its
gdfo is the same as the trunk revision one).
It should also help to find a starting number for BRANCHREVNO
(but this more an intuition than a proven theory at this point).
There is still a problem with gdfo in the presence of ghosts
since filling ghosts may change the path length. So either that
property should be stored along the revisions so we can change it
without altering the revisions themselves or we should add some
arbitrary value to take future ghost filling into account (but
I'm pretty sure it's a bad solution that will void many good
properties, intuition again).
Vincent
[1]: Based on some previous (and consequent) work by John (always
gives credit :)
More information about the bazaar
mailing list