[BUG] regression in dirstate Unicode handling on Mac

John Arbash Meinel john at arbash-meinel.com
Fri Mar 16 20:11:47 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I just wanted to bring up a comment that our Unicode handling is a bit
broken on Mac with the new dirstate changes.

Basically, this is because of the old "Normalization" problems that we
ran into in the past.

I'm not positive if I want to fix this or not. The way we fixed it in
the past, was to custom handle unknowns on Mac. By trying to match them
again with the normalized form.

However, I did some testing, and it seems SVN doesn't try to handle this
case (and neither does hg, cvs, and I've had difficulty getting the
others installed on my Mac to test).

I was originally hoping to do better than the other systems. And be able
to support what seemed reasonably common (have a filename with a
diacritic mark versioned).

But it seems like it is causing more trouble then it helps. (Because
Windows likes to create non-normalized files sometimes, it is a
performance burden, etc).

So one way to "fix" all of this, is to just go through the test suite,
and change the Unicode variables that we are using.

For example, I've been using "Dodé" or "Bågfors", etc as my unicode
characters (u'Dod\xe9' and u'B\xe5gfors'). But I can just as easily
switch to greek, russian, japanese, arabic or some other character set
that doesn't get the same treatment on Mac.

(It seems weird to me that the Mac is *more* broken on European
languages because they are "too similar" that it tries harder to munge
your data.)

Thoughts?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF+voDJdeBCYSNAAMRAtKmAJ0eQa+t77ai9cHdmho9ycSFVpWWYQCgsFIM
EqilkscbaFCckcqd7BVu5e8=
=Kz07
-----END PGP SIGNATURE-----



More information about the bazaar mailing list