[RFC] TreeTransform fooled by os.rename on case insensitive filesystems

John Arbash Meinel john at arbash-meinel.com
Wed Sep 3 15:10:00 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> I'm working with Guillermo on making the test suite pass on OSX.
> 

...

> 
> bzrlib.transform._FileMover.rename not raising errors.FileExists when:
> 

...
> OSX differs from windows here because OSError is not raised,
> instead 'foo' just replaces 'Foo'.

Yeah, that means it is conforming to mostly POSIX atomic rename.

> 
> There are several problems here:
> 
> - the test assumes a Windows behavior while requiring only a
>   CaseInsensitiveFilesystemFeature,
> 
> - since the rename succeeds on OSX, there is no way for
>   _FileMover.rollback() to restore a correct state ('Foo' content
>   has vanished at this point).
> 
> I think this indicates a deeper problem than just a slight
> mismatch between Windows and CaseInsensitiveFilesystemFeature.
> 
> Any idea about where and how that should be fixed ?
> 
>     Vincent

So, what happens if on linux you do:

bzr init
touch Foo foo
bzr add
bzr commit

And then do:

bzr branch LINUX

I would guess that randomly one of the files "wins" and the other is (possibly
silently) deleted. And then "bzr status" will show that one file is missing,
but "bzr commit" will think there is nothing to do. (Gotta love multiple code
paths.)

If that is the case, then I really would like to do something about it.

If we cleanly deleted a file, or marked a conflict, or a few other things, I
would be okay with it. But I'm guessing we just get into a semi-broken state.

One problem becomes that in order to handle this sort of thing, we then need
to "Look Before You Leap" for all the files you are about to add, which has
pretty severe performance issues when you consider a fresh checkout of a
project with 30k files. So you start wanting to "only check the ones that have
a chance to conflict". I'll try to spell it out differently...

1) We have a nice property now in TT, that when building trees, if it can put
something in the "final" location in 'limbo' it does so. So when checking out
that mega project, with 30k files, 99% of them will be in a subdirectory. So
in limbo you get:

  .bzr/checkout/limbo/
    new-1/ # Will be renamed to 'src/'
      file1.c
      file2.c
      ...
    new-2/ # Will be renamed to 'include/'
      file1.h
      file2.h
      ....

So at *rename* time, you don't have to check 30k files, you only have to check
the files and directories in the top-level of your source tree.

1b) Caveat... you have to *create* the files in these directories in a safe
way. So you will probably need to stat the location in limbo even if you
aren't going to stat the location they will be renamed to.

1c) Further, when you are updating a lot of these files (1,000 lets say) they
may all be created in the top-level of limbo, because the directories
themselves weren't modified.

So... what I would like to see is an optional code path, that is only enabled
on platforms that require it, which does a "os.lstat()" of the location we are
about to create (or rename) a file into, and then invoke the conflict logic if
that already exists.

I would like it to be optional, because it is unnecessary overhead for
platforms that are not case-insensitive.

What I would also *like* for Mac, is a way to get back the final filename,
rather than just a name we can access it with. And I don't know how to do
that. (Very weird result in testing... It seems that NTFS under Linux *is*
case-sensitive.)

Specifically, if we do:

f = open(u'B\xe5r', 'wb')
Under Mac it will be created as u'Ba\u030ar'.
Now, I see that there is an "f.name" attribute, but is that the name on disk,
or the name that you opened the file with (I have a strong guess that it is
the latter.)

fstat() doesn't seem to return any way to get the name, nor does 'lstat'.

os.listdir() seems to be the only thing, and that would mean doing one before
and after creation and taking the set difference (which seems really bad).

Is there a OS X function that we could call, (in a pyrex extension, for
example) that would give us better info?

So... in the interest of moving things forward, I'm fine with turning this
into a 'knownFailure()' on Mac, until we can figure out a solution. I'd rather
have a few XFAILs than have a test suite that doesn't pass.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvpq4JdeBCYSNAAMRAr5JAKCu/mFBwfS53udnBdxptETVK3lI0gCePzrp
e/zva5fftyhcD+K1pC36iMs=
=4AJd
-----END PGP SIGNATURE-----



More information about the bazaar mailing list