[BUG] bzr 0.8 add on Unicode filename fails

Mon May 8 22:49:58 BST 2006

John Whitley wrote:
> [FYI, I already entered the following into Malone as bug 43689; I'm 
> copying it here for discussion.. -J]
> 
> Platforms: Cygwin and Mac OS X 10.4.6 (PowerPC)
> bzr version: 0.8 release
> 
> Repro steps:
> 1) bzr init test
> 2) cd test
> 3) touch file.txt
> 4) Use Windows Explorer / Finder to rename file.txt to REQUÊTE.TXT
> 5) bzr add .
> 
> Cygwin results:
> bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode 
> byte 0xca in position 5: ordinal not in range(128)
>   at /tmp/python.572/usr/lib/python2.4/posixpath.py line 65
>   in join
> 
> Mac OS X results:
> added bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't 
> encode character u'\u0302' in position 6: ordinal not in range(128)
>   at /usr/local/lib/bzr-0.8/bzrlib/add.py line 61
>   in add_action_print
> 
> Note that the way the unicode name was generated on each system was 
> slightly different -- the Cygwin version was from a file we have in CVS, 
> while I manually retyped the version on Mac OS X using option-i E to get 
> 'Ê' on a US keyboard.
> 
> 
> -- John

I have a branch which has done a lot of the work, available here:
http://bzr.arbash-meinel.com/branches/bzr/encoding/

However, it is a little ways away from working properly on all 
platforms. Mac is the hardest platform, because it actually normalizes 
paths that you type in. And it does it in sort of an uncommon way.

You might try my branch, but I'm not making any promises yet, since I 
haven't finished refining all commands.
I'm somewhat hopeful that it will get merged into core, so that future 
work can be done on improving commands, rather than having it be a 
separate branch for too long.

John
=:->