Any plans to fix Unicode normalization issues on Mac OS X before bzr 2?
bahamut at macstorm.org
Wed Jul 22 04:14:54 BST 2009
The bugs for unicode normalization "awareness" have been open for
almost 2 years now. Is anyone even considering fixing them? With bzr 2
coming and the 2a format practically finalized, it would be a shame to
miss on addressing this very real issue because of required format
changes (possibly) being rejected because they came in too late. I'm
somewhat interested in the issue because I speak French and do have
files with non-ASCII characters and have came across this issue with
other version control systems (namely svn); I was kind of hoping bzr
could improve on that.
The bugs https://bugs.launchpad.net/bzr/+bug/172383 and https://bugs.launchpad.net/bzr/+bug/102935
seem to track the issue, and although they have some good
information, they don't have much discussion for possible practical
solutions. I'm not too familiar with Unicode, so I am not sure what
the correct approach is, beyond that it seems bzr should assume
precomposed form, and on Mac OS X have an additional layer to
decompose characters when writing their name out.
Perhaps some sort of content filter mechanism could be used to shield
bzr from the idiosyncrasies of Unicode composition. One possible idea
would be to use extended attributes to store the name of the file as
it appears in the branch index and use that instead of the file system
name for all operations. This would completely shield bzr from any
transformation Mac OS X might do to the file name, while ensuring the
information follows the file diligently. This would also (most likely)
work without any modifications to existing branch formats, and may
only require (this is a guess) a checkout format change.
More information about the bazaar