[MERGE] deprecated EmptyTree

Tue Jul 25 15:00:28 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Sun, 2006-07-23 at 21:48 -0400, Aaron Bentley wrote:

>>I don't have a problem conceiving of a tree without a root.  The empty
>>tree is meant to represent nothing, and I find it hard to conceive of
>>nothing having a root.
> 
> 
> I disagree - the empty tree represents *an empty tree*, not nothing. Say
> we have an empty tree '/tmp/foo'. That directory exists. If /tmp/foo
> does not exist, we dont even have a tree at all.

Depends on your definition of tree.  The tree interface in bzr is a
container object, and when there's no root revision, it's empty.  If you
look at how EmptyTree was defined in revno 1, it was an implementation
of the Tree interface that contained nothing, making it an "empty Tree".

I think that when you ask for the tree associated with the origin
revision, you should get a representation of nothing.  If EmptyTree is
contentious, let's call that NoTree.

>>The effort in nested trees is toward making root as un-special as
>>possible.  It's just a directory that doesn't have any parents.  And
>>until you init, that directory has no id, so you shouldn't pretend it did.
> 
> 
> So before you init we dont even have an empty tree, we've got nothing.

Right, which is why I'm saying the origin revision's Tree should be NoTree.

>>I think that representing a change in root id value would mean another
>>level of indirection, and I would strongly oppose that.
>>
>>Representing it as a delete+add would be possible, but if we must
>>special-case delete+add, it's better to just special-case add.
> 
> 
> I dont think what I'm proposing adds a *new* special case at all. (I'm
> not arguing that it is a special case, just that its not a new one).
> Existing trees have a ROOT_ID root node id. They will want to change
> that ROOT_ID to be a unique id, if they want to be nestable. 

I think it's good enough to do add+delete, so technically, I don't think
they want to change their ROOT_ID into something else.  Add+delete is
sufficient.

> And they'll
> want merges of this to Do The Right Thing with files in /. 

Doing add+delete isn't bad.  It causes conflicts where people with the
old root added files, and then merged, but those conflicts will be
recoverable.

I don't think the 'upgrade' case is a big enough deal to justify
sticking special cases in our code for it.  I would rather have the code
be simpler and predictable.

> This is
> exactly the same special case as assigning a unique root to the first
> commit if the empty working tree starts with ROOT_ID.

Disagree.  This case has to handle children, while the other did not.

>>I don't think we can conclude anything about the contents of
>>directories, based on their filenames, anymore than we could with files.
>> Some directory names are common.  Others are rare.  Some appear
>>multiple times in a source tree.
> 
> 
> If we compare both file-id and filename, we can do so I think, for some
> very common cases. We may want it to be able to be 'turned on' with a
> flag if its not considered robust enough to be on by default.

I would be happy to support a variant of this behaviour with a flag.
(see below)

>>By your proposal, we would wind up with the shelf tests intermixed with
>>the bzrtools tests.  We would also be unable to fix the situation,
>>because one of the 'test' file ids would be lost.
> 
> 
> Well heres I would like to see that work:
> $ bzr merge --force SHELFURL
> ... some output ...
> $ bzr st
> ... realise that ./test is incorrectly mingled
> $ bzr revert
> $ bzr merge --merge-into=shelf SHELFURL
> ... some output ...
> $ bzr st
> ... shows the desired output ...

We already support the 'merging Shelf' case.  I don't think we need to
break that support and then fix it with a new parameter.

>>>I dont see the
>>>root node needing root-specific-special casing if this is done.
>>
>>The handling you describe only handles duplicate adds.  But the merge
>>scenario I describe has a spurious tree-root deletion, and this doesn't
>>handle that.
> 
> 
> I think these are different use cases - my description was not intended
> to handle joining of trees at roots.

I wasn't referring to the shelf scenario.  I was referring to this one:

>> If we do a merge of unrelated trees, and BASE has a TREE_ROOT for a
>> root, and OTHER has UNIQUE_ROOT-asdf for a root, and THIS has TREE_ROOT
>> for a root, the merge will attempt to delete TREE_ROOT.  It won't
>> succeed of course, but it will produce a conflict.  Preventing that
>> conflict would require special-casing.

> Heres a use case I have in mind:
> 
> user 1 does a tailor conversion of $PROJ to bzr
> user 2 does a tailor conversion of $PROJ to bzr
> 
> now user 1 wants to merge from user 2.
> 
> There are a number of things we need to implement for this to be doable
> - but please dont let that distract us :).
> 
> During this merge, the directories will have identical paths, and its
> here that merging the content is IMO the right *default* behaviour.
> Remembering that the user will have had to do --force or some such to
> make it merge at all.

I think your proposal doesn't go far enough to support this scenario.
You wrote "I think there is a general-special-case for handling
directory nodes that is different to handling file nodes".  I think to
handle this scenario best, we should also treat files with the same path
as the same.  Otherwise, virtually every file will have an add conflict.
 But many of them will have the same contents, so if you use paths for
identity, you won't even get contents conflicts.

So yes, the user should have to specify "merge --by-filename" or
something.  So it doesn't answer the scenario (with UNIQUE_ROOT-asdf) above.

>>I don't believe you will achieve that.  I think this will produce more
>>special cases, because we'll have to deal with a delete/add pair,
>>instead of just an add.
> 
> 
> We could implement fileid aliases first. Though I'd rather not. I do
> think that handling delete+add here is easy though: in terms of
> snapshots if we change the root id and add a file, we will see:
> OLD:
> ROOT-ID
> ROOT-ID/foo
> ROOT-ID/bar
> 
> NEW:
> UNIQUE-ROOT/
> UNIQUE-ROOT/foo
> UNIQUE-ROOT/bar
> UNIQUE-ROOT/gam
> 
> Now, a merge that covers OLD-NEW into a tree with ROOT-ID and a fourth
> file 'quux' will see:
> foo is reparented - no problem
> bar is reparented - no problem
> gam is new - no problem
> quux is not deleted, but its parent is deleted, and replaced with a new
> node at the same path.

When a parent is deleted, but has living children, our conflict handling
will cancel the deletion.  If that worked under this scenario, we'd
probably get:

UNIQUE-ROOT/foo
UNIQUE-ROOT/bar
UNIQUE-ROOT/gam
UNIQUE-ROOT/.moved/quux

> so for handling quux, this is one of those things we still dont do
> *really well* - handling tree shape conflicts in the nicest possible
> way. (We're a lot better than a lot of systems, dont get me wrong). I
> think handling quux here would be done nicely be:
>  - mark quux as being conflicted
>  - put quux on disk inside the new directory (we have no home for it as
> its parent was removed, so this is no worse than any other path).

We currently don't permit parents of living children to be removed, and
I think this is a better approach in general.  It retains the names of
the parents, and it avoids mixing files with different parents.

I'm not convinced that supporting the upgrade from TREE_ROOT to
UNIQUE_ROOT justifies all this.  Many projects won't upgrade at all.  Of
those that do, many will not need to merge adds before all contributors
upgrade.  Those that need to merge adds can handle some conflicts.

>>I don't think I understand this concept of having a root be a tree.  If
>>the tree == the root, then two different trees must have different roots.
> 
> 
> Well we identify a node by path + id. If the root node and the tree are
> synonymous, then two trees which have the same root id are completely
> possible. (because the path of the root is '/' always, and the id is
> able to be the same). The physical location of the tree - the base, or
> url - would be a separate attribute on the tree.

To me, the tree includes the children of the root.  Saying the tree is
the root means that two trees with different contents must have
different roots.

>>>In fact, my tree interface tests will
>>>assure that it is. I think that an empty working tree *may* be different
>>>to an empty tree, once its had a root id assigned to it, but only after
>>>that.
>>
>>But root ids are assigned when working trees are initialized.
> 
> 
> 
> I get that. I dont see *why* we have to assign the root id at 'init'
> time, rather than at first commit time.

There must be a root ID in order to add files to the root.  Are you
proposing that should be TREE_ROOT, and then we should rewrite it at
commit time?  I think that would be really hairy.  Plus it would make it
harder for importers to set a unique root id.

I think 'add' or 'init' are sensible places to assign an id to root.
There's no reason to delay it 'till commit time.  I chose init because
it most closely matched our current behaviour.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFExiP70F+nu1YWqI0RArD+AJ47Yz+GRHyUZG7zZ3e69Ol9Mkez1gCfckOe
VYAjkuBKUtEXJZiMWykGr/M=
=8w8E
-----END PGP SIGNATURE-----