Tree transactions (was: the inventory must be updated as merge proceeds, not at the end)

Mon Jan 2 15:17:48 GMT 2006

Agreed.  I would argue, though, that the surgical operations have gone
into a more appropriate place.  That code which is trying to generate a
merged tree doesn't and shouldn't care about how the operations are
juggled in order to satisfy POSIX requirements.  And the API is an
imperative one, instead of a callback-based one, so the conflict
handling is easier to maintain.  Also, since conflicts are detected in
advance, it should either succeed, or fail before attempting to apply
changes.  You should never wind up with the tree in a transitional state.

>  With the 2-phase (limbo-in, limbo-out) algorithm, what
> surgical operations are needed is trivially determined by each
> changeset entry.

With this approach, you don't have changeset entries per se, but you
have equivalent information.

> With the shadow FS approach, these operations are
> applied on the shadow and recorded so that they can simply be played
> back at commit-time.

Right.  It doesn't address improved conflict resolution or simplifying
the tree transform API, which is what I was trying to do with this.

>  With your approach, you need to somehow
> rediscover a sequence of surgical operations that will do the trick.
> At first sight, that seems harder.

I don't think it's very hard.  Once you know what your output tree is
supposed to look like, it's just a matter of sorting by path length, and
doing the two-phase algorithm.

>>Conflicts are the big dragon I'm trying to slay.  Conflict resolution
>>can itself cause conflicts, and that is very hard to handle in the
>>current model.
> 
> 
> I have only started to look into conflict resolution, so I am not very
> confident in my intuitions, but I don't think you're right.  If you
> keep the (virtualized) tree and the inventory (object) in synch, it
> seems pretty straightforward.

Here are some of the conflict scenarios I'd like to handle

Conflict in conflict files
==========================
foo.BASE is a versioned file, foo is a versioned file.  We get a
conflict applying a text change to foo.  This will attempt to produce
foo.BASE, which already exists, so we should have handling for that.

Two new files with the same name
================================
Say in BASE, we have
foo
bar
baz

In OTHER, we rename
foo -> baz
we create
bar/baz

In THIS, we move
foo -> bar/foo

The result is that we try to get two files named bar/baz

Conflict files with correct names
=================================
Consider the above, where baz.BASE already exists, and both files have
text conflicts.

Filesystem loops
================
In BASE, we have
A
B

In THIS, we have
B/A

In OTHER, we have
A/B

Naive merging produces a loop in which B is a child of A, which is a
child of B, which is a child of A...

File Type mismatches
====================
In BASE, we have
B/ (a directory)

In OTHER, we have
B/
B/C

In THIS, we have
B (a symlink)

AFAIK, this one isn't currently possible, because filetype changes are
forbidden.

I'm not trying to prove that these are impossible to handle in the
current regime.  But I do think that they would be far simpler to handle
correctly with the API I've proposed.

>>The benefits [of the shadow FS] are limited to
>>1. prediction
>>2. rollback
>>
>>right?
> I don't know if "rollback" is the right term, since there's nothing to
> undo: the filesystem is not modified until "commit", but otherwise,
> yes.  It's basically like operating on a copy of your actual FS.

No, I did mean rollback; if we encountered an error while applying the
changes, it would presumably be trivial to reverse-apply the
previously-applied changes and restore the working tree to the state it
was in before we tried to apply any changes.

Note that I'm talking about errors, like EACCES or EDQUOT, not
conflicts, per se.

>>that layer to be the WorkingTree.  So yes, it is "designed for the
>>working tree proper".

> The shadow FS idea would make it possible to virtualize what's under
> .bzr and trivially obtain --dry-run support and transactional
> semantics for that too.  I don't know if that's useful though.

I think we're talking past each other here.  I think you're talking
about delaying access to the bzr control files.  I'm also unsure whether
that's useful.

I'm talking about presenting a view of the filesystem to bzrlib that has
all filles in their unexpanded form, despite the fact that the on-disk
form is expanded.  I don't insist that this layer be the WorkingTree,
but it did seem like a reasonable choice.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDuUQc0F+nu1YWqI0RAtNpAJ96U7FHeJGwojm+TRFq/YXSrTq6OQCfXEmV
//7Ofe0KH9pyUU8wMXUwDeI=
=IQ5n
-----END PGP SIGNATURE-----