bzr, unicode in file names and Mac OS X

John Arbash Meinel john at arbash-meinel.com
Fri Apr 13 18:41:36 BST 2007


Hinnerk Haardt wrote:
> Hi,
> 
> I encountered three possibly related tracebacks while trying to to move
> a version history from svn/svk to bazaar using tailor. They have in
> common that the file names contain umlauts and are created on Mac OS X.
> 
> The first two can be reproduced only on Mac OS X, the third on Linux
> using the attached .tgz:
> 
> 
> 1) System: Mac OS X 10.4.9, LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8,
> shell is bash in Terminal.app:

Thanks for the bug report. This is sort of a "known bug" with Mac's
filesystem. In that the rest of the world considers ü to be a single
character: u'\xfc' (LATIN SMALL LETTER U WITH DIAERESIS),

Mac re-normalizes it to 2 u'u\u0308' ('u' and '\u0308' COMBINING DIAERESIS).

Now, Unicode specifies that those are both valid ways of representing
the concept of ü.

However, it means that if you create u'\xfc' on Linux, and commit. And
then checkout on Mac, all of the sudden your existing file is marked as
missing, and the new file is marked as unknown.

In bzr 0.14 and earlier we tried to account for this fact. So when files
were added, on non-Mac we would check that they were properly
normalized, and on Mac we would re-normalize (to account for Mac's choice).

However this causes some other problems, because other platforms don't
always normalize (Win32 seems to create wide character japanese
parenthesis).

All other systems that I tested just ignore this (and live with the fact
that a versioned file now has changed name on Mac, and thus forces all
other platforms to use a different name).

So we decided to stop fighting as hard for what we considered
"correctness" in 0.15. But obviously some of the old code remains.

If you just want your import to succeed, you can:

   1) Use WorkingTree3 (bzr init --knit) which is the format for 0.14
      and earlier.

   2) Take out all calls to 'osutils.normalized_filename()'.

The internals will then treat paths by whatever they exist on disk, and
it is up to the user to deal with the fact that Mac OS X is breaking
their filenames. (which is what svn, cvs, git, darcs, and hg do).

I'm sorry this is causing a problem for you. We were trying to be nicer,
but it seems to be causing more problems than it helped.

John
=:->



More information about the bazaar mailing list