[win32] non-ascii/non-english file names: internal usage of file names

Alexander Belchenko bialix at ukr.net
Tue Nov 29 14:12:05 GMT 2005


As I can see cardinal difference between Windows version of Python and 
Linux/Cygwin Python in following fact: when you use flat string on 
Windows for base part of file names then all derived file names is 
always representing as flat string. On Linux/Cygwin as I can see in 
situations when path cannot be represented as flat string (or in ascii 
encoding?) it silently converted to unicode. As result we have different 
behaviour with non-ascii (non-english) file names.

For workaround of this incompatibility in bzrlib code always should use 
unicode file paths for all operations. Key points here is default 
directory values such '.' used in construction Branch object etc.

At this moment I found 2 weak point of bzrlib.

1) file bzrlib/builtins.py, function branch_files -- should have definition:

def branch_files(file_list, default_branch=u'.'):
                                            ^^^^

2) file bzrlib/add.py, function _prepare_file_list(file_list) --
  default file list for add command should also be unicode string:

def _prepare_file_list(file_list):
     """Prepare a file list for use by smart_add_*."""
     import sys
     if sys.platform == 'win32':
         file_list = glob_expand_for_win32(file_list)
     if not file_list:
         file_list = [u'.']        # <<<<<<<<<<<<<<<<<<<< !!!!!
     file_list = list(file_list)
     return file_list


There is also some places in code when used flat string such '.' or ''. 
I'll try to find it all and fix then I'll create patch.

But here exist another critical problem with non-ascii file names. In 
some situations used StringIO file-like object for catching output of 
another command. But StringIO could catch only ascii flat strings or 
unicode strings. So when I'm try to commit tree with non-ascii filenames 
+ want to use external editor for entering commit message I have 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
0-5: ordinal not in range(128).

What the plan for such situation?

Alexander




More information about the bazaar mailing list