[MERGE] OSX test suite passing (well as of 2008/09/08 :)

John Arbash Meinel john at arbash-meinel.com
Mon Sep 8 22:06:13 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:

...
>     john> I suppose we could just force the user encoding,
>     john> terminal encoding, etc on Mac.
> 
> So I did that, force os.environ['LANG'] = 'en_US.UTF-8' if LANG
> is not set.
> 
> It may not be perfect, but it's better than failing like we do
> now.

Is it possible to just do "en.UTF-8" ? I don't know if that is
considered valid, but I'd rather not be *too* US-centric.

Also, I think under Linux it will fail if you don't have the given
locale installed. Obviously this is under Mac, so we can be mac specific
here.


>     john> It is using a NFC utf-8 string.  Change the test to use
>     john> a string that can't be combining.
> 
>     john> For example instead of using u'\xe5' == å == u'a\u030a' use something
>     john> like "\N{omega}".
> 
> I went with the euro sign:
> - Unicode: U+20AC
> - utf-8: 0xE2 0x82 0xAC
> 
> \N{omega} wasn't recognized, I used \u2375 but it wasn't present
> in the font I used under Linux and hard to read under OSX.

Sure, I've used Omega elsewhere, but Euro is fine, too.

> 
> I tried the copyright sign but there are several possible
> interpretations:
> - U+00A9 COPYRIGHT SIGN
> - U+2117 SOUND RECORDING COPYRIGHT
> - U+24B8 CIRCLED LATIN CAPITAL LETTER C
> 
> and I thought we had enough confusion.

Ouch.

...

> 
>     john> I would change the test to not use a character than
>     john> plays poorly with normalization. We have that problem
>     john> on Mac, but we don't need to test it at every
>     john> turn. When we *fix* that, we will write tests
>     john> explicitly for it.
> 
> All right. I take that as meaning 'the test used an invalid file
> name which higher layers will catch in production code'. And
> given the other failing tests in this thread, that sounds
> correct. Right ?

Actually, I think the filename is valid on other platforms. What happens
is that we use a NFC normalized name, which gets translated to NFD by
Mac. And that doesn't match up well.

Stuff like "iter_changes()" seeing a file go missing, and another one
show up as unknown.


...

>     john> Specifically, we want to refuse to add or rename a
>     john> non-normalized filename. Eventually we want to relax
>     john> that restriction.
> 
> Not a problem on OSX where you can't create such a file.

Actually, not entirely true...

Back many months ago (I think maybe 2 years even) I worked out code to
handle NFC/NFD between Mac and other platforms. Basically, the standard
form is NFC. It is the recommended form for XML, it is the form where
iso-8859-1 and utf-8 overlap the most, and is generally what you get on
Windows and Linux. (In fact on Windows, NFD forms generally give you a
letter followed by a square box, rather than å)

So what I did was say that *all* files added on Linux or Windows *must*
be NFC form, and then on Mac, I would transcode. So all files as NFD on
Mac got renormalized when read from disk.

The really nice thing is that checking in å.txt on Linux, and then
checking it out on Mac would still treat the file as unchanged. The
downside is that it was doing a lot of renormalization under the covers,
and was generally difficult to do correctly with Dirstate. Further, it
seems that some people liked to create non-NFC filenames on Windows.
(Specifically, Japanese windows would create wide-character parentheses
(), rather than there normal counterparts.)

Which meant that I was pretty much damned-if-I-do, damned-if-I-don't,
and *doing* was harder than not doing.

So at this point, we still have code that complains about adding non-NFC
filenames, and a codebase which doesn't really handle it correctly in
all cases. It took me a lot of effort to get it working the first time,
and it just wasn't worth it (to me) to fight through all that again.

> 
>     john> But until we can get there, we need to filter at add
>     john> and rename time.
> 
> I read that as 'bzr filter at add and rename' *now*. So even when
> checking out on OSX (or on an HFS mounted fs) bzr never
> *produces* such files.
> 
> It comes that these tests can be safely skipped on OSX.
> 
> Or do we want to add a specific one that indeed, you can't create
> such a file ?

It is supposed to check at add and rename, as that is the only way to
inject new filenames into the system. I'm not 100% sure of what to do on
Mac. People want to be able to add the files. I came to the point of
deciding to just let go, and let people version any name they want. And
just live with the fact that it breaks between Mac and any other
platform. (Mostly because svn, hg, git, and all others break as well.
Though I've seen some emails about svn trying to play with normalization.)



> but for workingTree3 he didn't even reach that, so I filed a bug
> and mention it because there was another problem I didn't
> diagnose at the time and I wanted to keep them separate.
> 
> But both are gone now.
> 
> So, sorry for that loong mail, here is the patch,
> 
>     Vincent

I'll try to look at the patch more directly.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjFk8UACgkQJdeBCYSNAAMiEwCgzhFIXQqbjuWPWxhQK8ODZsNU
nJoAnj1YdiUAAm+QzF0ztNvwdlKudHT3
=VpqS
-----END PGP SIGNATURE-----



More information about the bazaar mailing list