[RFC] making TreeTransform more resistant to unexpected failures / writing problems

Aaron Bentley aaron.bentley at utoronto.ca
Thu Aug 9 17:16:55 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 8/9/07, John Arbash Meinel <john at arbash-meinel.com> wrote:
>> Anyway, what I would actually like to see is for TreeTransform to try to
>> rename-into-place and if that fails, it can rename it 'to the side' and
>> mark the file as conflicted.

We don't do atomic renames here; we always delete the original before
installing the new one.  This means that if the old file cannot be
deleted, it will fail on delete, not on rename.

Also, I'm not clear on which file you're renaming to the side.  It
sounds like you're renaming the existing file, but will that work?  I'm
not clear on that.

>> This probably has some purity issues for Aaron, as he has always wanted
>> to detect all conflicts in the beginning, so that by the time he gets
>> around to actually doing the updates, he knows what will happen.

I certainly do.  For one thing, it's nice to know what applying a
transform will actually do.  We don't say "change this tree in this
way...maybe", and I don't know why you'd want to.

But on a more practical level
- - renames frequently require moving file ids, also.
- - renames can produce conflicts with other files.
- - renames can produce conflicts with other file ids
- - the THIS, BASE, OTHER conflict code is at a higher level than
  TreeTransform.

>> However, it works for case conflicts

I think that's a really horrible way to handle case conflicts.  Our
standard code for detecting duplicate file names should be handling that
case.

>, as well as locked files, or files
>> showing up while you are renaming into place. (I guess that means it
>> also needs to conflict if the '_apply_deletions' step fails).

Depends on the failure.  If the file disappears before I can remove it,
should that be a failure?

>> Ultimately, I would rather not rollback the entire transaction, instead,
>> just mark a few more files as conflicted, and put at least .THIS files
>> around.

I think if we did that, we'd need to do another round of conflict
resolution.  And to do that, we'd need to edit the transform to reflect
the operations that did succeed.  Which gets really gross and slow.

>  We could retry the replacement but doing it immediately after will
> probably find that it's still in use.  It might be better to do all
> the files that we can, then come back and retry the ones that were in
> use.

Well, "all we can" may be a small set.  TreeTransform is designed to
work around ordering issues, so removing the files out-of-order breaks
its design.

> The other thing to consider is that on Windows, it could be better to
> write over the files in place rather than renaming the new file into
> place.

On Windows, you can't do atomic rename.  You must delete the target file
first, and that's the step that's failing.

There are a few ways to interpret your suggestion of overwriting:

overwrite-on-TT.create_file
- ---------------------------
The most efficient way to overwrite files directly is when
TreeTransform.create_file is called.

This breaks everything.  Transforms are not supposed to have any effect
until they are applied.  We would need to make substantial changes to
every caller.

This doesn't address cases where we're trying to delete the locked file,
not modify it.

overwrite-on-TT.apply_insertions
- --------------------------------
Instead of renaming the file from limbo into place, we can
unconditionally read the limbo file, and use its contents to overwrite
the target file.  This is slow.  It also doesn't address cases where
we're trying to delete the locked file, not modify it.

conditionally-overwrite-on-TT.apply_insertions
- ----------------------------------------------
We could try our current approach, and if it fails, try overwriting.
This would reduce the performance impact, but reduce test coverage.  It
also doesn't address cases where we're trying to delete the locked file,
not modify it.

> I think many programs that hold the file open are not denying
> writers, but just refusing to let it be renamed or deleted.

I'm a bit confused about whether renames are verboten when a file is
held open.

> That has
> some risk that we'll overwrite the only copy of the file, so we might
> like to make a backup first.

To summarize:

Rollback
- --------
- - simple change
- - fails safe
- - succeeds in everything except deleting the old file, if old file can
  be renamed.
- - slightly slower, because it will need to rename deleted files out of
  the way, and delete them at the very end.
- - may be implemented with journalling, so hard interruptions are
  recoverable.

Rename-aside
- ------------
- - changes programming model
- - fails unsafe if a file cannot be renamed aside
- - produces conflicts when locked files are encountered
- - slower, due to repeated conflict handling

Overwrite-on-TT.create_file
- ---------------------------
- - changes programming model
- - succeeds if file can be overwritten, but not deleted
- - fails unsafe if file cannot be overwritten
- - fails if a locked file must be deleted (not modifed)
- - increases the window where interruptions/failures will leave tree in
  bad state

Overwrite-on-TT.apply_insertions
- --------------------------------
- - succeeds if file can be overwritten, but not deleted
- - fails unsafe if file cannot be overwritten
- - fails if a locked file must be deleted (not modifed)
- - slow

Conditional-overwrite-on-TT.apply_insertions
- --------------------------------------------
- - succeeds if file can be overwritten, but not deleted
- - fails unsafe if file cannot be overwritten
- - complex, and slightly slower

I still like rollback the best.  I could be convinced about
Conditional-overwrite-on-TT.apply_insertions, but it's not a complete
solution, because it doesn't address cases where we're trying to delete
the locked file.  But it could be combined with rollback.

Both rollback and rename-aside require the ability to rename locked
files.  Rollback will simply fail to delete the locked files at the end,
whereas rename-aside produces conflicts.  So I think rollback is better.

Rollback is unique in that it addresses other failure scenarios.  And if
rollback was based on journalling, it could even be done if Bazaar was
interrupted uncleanly.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGuz330F+nu1YWqI0RAnCBAJ0Zrmae4m6lPs5sc/ScEYSH0S9hfQCdGBl7
PLYzH6uv0tDlNN14P7nMfNg=
=z1fo
-----END PGP SIGNATURE-----



More information about the bazaar mailing list