[BUG?] revisiontree's root node has no revision attribute set

Robert Collins robertc at robertcollins.net
Thu Jul 27 15:21:52 BST 2006


On Thu, 2006-07-27 at 09:11 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > For a RevisionTree, for any entry you can use 'entry.revision' to get
> > the last changed revision.
> > 
> > We dont currently set that for the root node.
> > 
> > I think a reasonable change for this is to start by setting it to the
> > revision_id of the revision tree.
> 
> Isn't the revision_id for directories only changed when we move or
> rename them?  And therefore, isn't it impossible for it to have changed
> since the first commit?  So shouldn't we use the revision-id of the
> first commit?
> 
> The lack of a revision_id was the reason we were talking about upgrading
> repositories for nested-trees.

Yes. When we upgrade we generally synthesise data though, so I guess the
question is what data to synthesise here.

> > We could also change commit to start recording changes to the root the
> > same way it records changes to other directories.
> 
> I am definitely in favour of that, but I believe it's an inventory (and
> therefore repository) format change.

Yes it is. So there are four decisions to take IMO:
 - what data to present when reading an old format repository
 - what data to set when upgrading a repository
 - what the impact on testaments should be
 - what the root revision value should be set to.

> > I'm seeking a +1 on the former reasonably urgently, so that the dirstate
> > serialiser can do something sane
> 
> One option would be to use NOT_REVISION until such data is available.

Yes. Another is to synthesis the correct value on the fly.

John raises an interesting point which is whether we should set the
revision value to the last-change of the tree, or of the specific node.

I think there are *three* interesting revision values for each node:
 A the revision it itself was last changed on, and
 B the revision its contents were last changed, and 
 C the most recent revision any of its children all the way down were
most recently changed.

Defining those more precisely:
 a node is changed any of if its:
 - parent
 - name
 - type 
 - parent-revisions list (which we dont store explicitly at the moment,
but do calculate during commit).
is altered.

A revisions contents are altered when for:
 a file - its text changes, or its exec bit changes
 a symlink - its symlink target changes
 a directory - the list of nodes within the directory changes. This list
could be defined in numerous ways, I suggest
'fileid:node-last-change:content-last-changed'. This definition has the
result that the 'content-last-changed' value propogates up a tree, and
makes my defintion C above redundant with B.

information of type A are useful for determining how a single node has
changed between different trees (you could go further and record the
revision each individual node attribute changed on). These changes are
generally tree shape changes - add all the changes of type A together
and you have a new tree shape.

information of type B is useful for determining when an entire subtree
has changed. For instance if you have the value of B for a directory
during a merge, and its the same in both trees, you can skip all
comparisons within that directory, even if the directory node itself has
been altered.

Currently we dont calculate the value of B for directories - we only
calculate the value of A. For files we calculate both A and B and take
the newest value.


So one answer to 'what the root revision value should be set to.' is
'the value of A' - which is consistent with directories, but of not much
use for nested trees - we dont need A for nested trees, we need B. OTOH
there is no requirement for nested trees that the value stored in the
parent be the same revision stored in the child inventory - they are
decoupled.

Another answer is 'the value of B' which is *still* not useful for
nested trees, because a commit that does not alter the tree will still
need to be recorded in the parent.


So given that none of the useful values for node.revision on a root
matter in terms of what the parent tree records, I think that we should
treat the root like any other directory, and set it to its 'A' value.


Now, for existing trees we can synthesise a value for this as Aaron
observed - the first commit. (what to do when there isnt a 'first'
commit ?). NOT_A_REVISION is an option too.

Hope this helps - in summary, I think make root dir's .revision be
treated like any other directory, and If/When we want to start tracking
the change-in-content revision marker separately, we can do it across
the board. Neither way affect what a parent tree of a nested by
reference tree will record.

Until we do the format bump, I think synthesising on the fly is a good
strategy, because it will let code be written without special cases
today - except for the one in commit to not record that value to disk.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060728/a72da5ca/attachment.pgp 


More information about the bazaar mailing list