Any plans to fix Unicode normalization issues on Mac OS X before bzr 2?

Jean-Francois Roy bahamut at macstorm.org
Wed Jul 22 04:14:54 BST 2009


The bugs for unicode normalization "awareness" have been open for  
almost 2 years now. Is anyone even considering fixing them? With bzr 2  
coming and the 2a format practically finalized, it would be a shame to  
miss on addressing this very real issue because of required format  
changes (possibly) being rejected because they came in too late. I'm  
somewhat interested in the issue because I speak French and do have  
files with non-ASCII characters and have came across this issue with  
other version control systems (namely svn); I was kind of hoping bzr  
could improve on that.

The bugs https://bugs.launchpad.net/bzr/+bug/172383 and https://bugs.launchpad.net/bzr/+bug/102935 
  seem to track the issue, and although they have some good  
information, they don't have much discussion for possible practical  
solutions. I'm not too familiar with Unicode, so I am not sure what  
the correct approach is, beyond that it seems bzr should assume  
precomposed form, and on Mac OS X have an additional layer to  
decompose characters when writing their name out.

Perhaps some sort of content filter mechanism could be used to shield  
bzr from the idiosyncrasies of Unicode composition. One possible idea  
would be to use extended attributes to store the name of the file as  
it appears in the branch index and use that instead of the file system  
name for all operations. This would completely shield bzr from any  
transformation Mac OS X might do to the file name, while ensuring the  
information follows the file diligently. This would also (most likely)  
work without any modifications to existing branch formats, and may  
only require (this is a guess) a checkout format change.

Thoughts?

Jean-Francois Roy



More information about the bazaar mailing list