[MERGE] setlocale (#128496)

Thu Jul 3 08:05:00 BST 2008

Martin Pool wrote:
> I'm just about the merge the 'mini' version of this patch.  I can see
> the other bits are useful but since that one is already approved it's
> a step forward.

Thanks! I'll splice out my various other modifications into separate 
branches once that is done.

> I think it should be localized when shown to the user.  However there
> might be other cases where it should always be in C locale.  For
> example I believe patch parses the date from a patch file.  I doubt it
> cares about the day of the week but if the rest of the date is not in
> the standard form it might cause problems.

I only found %a to be affected by locale. There seem to be no date 
formats with %c, %X, %x or any month names in bzr. So there should be no 
problems as long as patch accepts any binary input and ignores the day 
of week. As even my full setlocale branch only changes day of week 
behavior for the log and info commands, patch should never be a problem, 
though.

> So we'd want to check
> that, and maybe specify have different functions that give either the
> C or local version.

Within the bzr code that's already implemented in my full patch, with a 
flag to the format_date function. From outside, you could always call 
"LC_TIME=C bzr" on the command line.

>>>> @@ -1348,15 +1354,7 @@
>>>>     if _cached_user_encoding is not None and use_cache:
>>>>         return _cached_user_encoding
>>>> -    if sys.platform == 'darwin':
>>>> -        # work around egregious python 2.4 bug
>>>> -        sys.platform = 'posix'
>>>> -        try:
>>>> -            import locale
>>>> -        finally:
>>>> -            sys.platform = 'darwin'
>>>> -    else:
>>>> -        import locale
>>>> +    import locale
>>>>     try:
>>>>         user_encoding = locale.getpreferredencoding()
>>> So if a bzrlib client wants to use bzrlib.osutils.get_user_locale(),
>>> they must do the darwin workaround themselves?  That doesn't seem nice.
>>>  Is there any harm in continuing to do it here?
>> Good point! I had added this in response to a comment from Martin Pool, but
>> hadn't thought of the embedded case.
> 
> I'm not sure we're doing the calling application any favours by doing
> this the first time they call bzrlib.osutils.get_user_encoding().  The
> import statement of course will only actually execute the code in
> locale.py the first time it's executed.  So if they happen to
> (implicitly) load locale first they won't see this code, and many
> applications might do something else that will cause that.
> 
> If we had fiddle_osx_locale in bzrlib/__init__.py it might be more reusable.
> 
> Possibly this code should just call it then adjust the output but that
> might have other consequences.

What form does this OS X bug in python 2.4 actually take?

>>>> === modified file 'bzrlib/tests/__init__.py'
>>>> --- bzrlib/tests/__init__.py    2008-06-11 07:54:54 +0000
>>>> +++ bzrlib/tests/__init__.py    2008-06-21 16:37:
>>>> @@ -3075,7 +3089,7 @@
>>>>     def _probe(self):
>>>>         try:
>>>> -            os.stat(u'\u03b1')
>>>> +            os.stat(u'\u03b1\u283f')
>>>>         except UnicodeEncodeError:
>>>>             return False
>>>>         except (IOError, OSError):
>>>
>>> ^^^ Why was this change needed?
>> Because with my tests in the ja_JP.EUC-JP locale, tests requiring a unicode
>> filesystem were not skipped, because the alpha character used before can be
>> expressed in EUC-JP as well. Just because one unicode character can be used
>> in a file name, that doesn't mean that all can. So I added a braille pattern
>> to that greek alpha, as I think it unlikely for any non-unicode encoding to
>> contain them both.
>>
>> This fix is still contained in the patch attached here, but not in my
>> setlocale.mini patch. As this fails
>> test_msgeditor.MsgEditorTest.test__create_temp_file_with_commit_template_in_unicode_dir
>> from the current bzr.dev as well in ja_JP.EUC-JP, I'd resubmit this again if
>> you prefer smaller changesets or reject this approach here altogether.
> 
> I think this is reasonable but you should really explain in a comment
> _why_ you're using this pattern rather than any other.  Just copy an
> abridged form of this text.

Will do so in the next merge request for that piece of code. You prefer 
an in-source comment over a verbose commit message in such situations?

> Maybe assertEqualDiff should actually do strict ascii conversion to
> make all callers explicit.

I guess I could check for a.__class__ == b.__class__ first and declare 
it an error if those don't fit. This would probably entail quite a few 
more modifications to test cases, I guess.

> I think that is a reasonable change but probably easier for people to
> review if you put it in a separate diff next time.  Because of the
> somewhat complicated behaviour of unicode strings in Python if we see
> a bunch of loosely-related changes it's harder to be really sure what
> was necessary.

Sure, that would be best introduced in a separate branch.

Greetings,
  Martin von Gagern

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080703/58dec757/attachment.pgp