[BUG] Unicode string must be always used with encodings

Alexander Belchenko bialix at ukr.net
Sun Sep 25 15:06:48 BST 2005


There is bunch of errors with unicode strings in bzr that exists because 
used default 'ascii' encoding for decode and encode. But this is bad, 
because bzr fails when works with russian (per example) filenames.

I propose always use specific encoding to current user.

For decoding flat string into unicode need to be used user_encoding,
that defined in bzrlib/__init__.py as:

import locale
user_encoding = locale.getpreferredencoding() or 'latin-1'

(I change default 'ascii' string to 'latin-1' because it will works for 
~80%...90% of all users)

For encoding unicode strings to flat string we need to use this encoding:

import sys
stdout_encoding = sys.stdout.encoding or 'ascii'
if stdout_encoding == 'ascii':
     stdout_encoding = user_encoding

We must try to define output encoding in this way because on my Russian 
version of Windows system encoding is 'cp1251' but encoding of console 
is 'cp866'.

Alexander.





More information about the bazaar mailing list