bzr-email fails if committer has non-ascii gecos

Glenn Morris rgm at gnu.org
Wed Aug 28 20:57:30 UTC 2013


Vincent Ladeuil wrote:

>> Urgh, you mean python can't decode utf8 if locales is not installed ?
>>
>> Could it be that python chokes on importing your source rather than not
>> being able to decode utf8 from an external file ?
>>
>> Do you still encounter the issue with:
[...]
>   string = 'Bela\xc3\xafche'

That was a good suggestion, but I'm afraid it made no difference.

Yes, it seems python does need locales to be installed, and furthermore
for the LANG environment variable to be (eg) en_US.utf8 for
string.decode('utf-8') to work.

However, I still can't get bzr to work correctly.
I discovered that the gecos data actually seem to be in latin-1, not
utf-8. Trying to decode it in utf-8 fails with
  UnicodeDecodeError: 'utf8' codec can't decode byte 0xef in position
  12: invalid continuation byte

So I set LANG=en_US.ISO-8859-1, but

  from bzrlib import osutils
  print osutils.get_user_encoding()

still returns 'ascii'. Looking at what get_user_encoding does, the
following returns "ANSI_X3.4-1968":

   import locale
   print locale.nl_langinfo(locale.CODESET)

as does this:

   print locale.getpreferredencoding(False)

But locale.getpreferredencoding(True) returns the correct "ISO-8859-1".


So I suppose I have to add a call to

  locale.setlocale(locale.LC_ALL, "")

at the start of get_user_encoding, or something like that?



More information about the bazaar mailing list