[MERGE] setlocale (#128496)

Thu Jul 3 05:44:26 BST 2008

On Fri, Jun 27, 2008 at 3:58 AM, Martin von Gagern
<Martin.vGagern at gmx.net> wrote:
> Hi Aaron!
>
> Thanks for your review!
>
> A question of general interest up front:
> do we want the day of week displayed to local users (e.g. for log or info
> display) in English or in their local language?
>
> I'd think local language, as a first step towards localization and
> internationalization of the application. Right now, however, you might think
> it funny to see local date elements in an otherwise English user interface.
>
> If you don't want localized day of week names, neither now nor in the far
> future, or if you want to keep changesets small, you might wish to instead
> apply setlocale.mini, a bare-bones version of this patch which I will post
> independently in a moment.

I'm just about the merge the 'mini' version of this patch.  I can see
the other bits are useful but since that one is already approved it's
a step forward.

I think it should be localized when shown to the user.  However there
might be other cases where it should always be in C locale.  For
example I believe patch parses the date from a patch file.  I doubt it
cares about the day of the week but if the rest of the date is not in
the standard form it might cause problems.  So we'd want to check
that, and maybe specify have different functions that give either the
C or local version.

> Those effects could be avoided by setting only LC_CTYPE, not LC_ALL.
> However, in my opinion bzr should not rely on things like LC_TIME being at
> its un-localized posix default. As I understand it, the bzr library is
> intended to be embeddable. A localized application that does embed bzrlib is
> likely to have called setlocale for LC_ALL. So bzrlib should be able to cope
> with that situation. And the best way to ensure this is by setting all
> facets in the bzr application itself as well.
>
> If you want to play it safe and avoid any effects that setlocale has on time
> formats, you'd have to avoid strftime as well, at least for the day of week
> formatting. Should be easy enough for those formats used by bzr. Quoting
> from the python docu: http://docs.python.org/lib/node745.html

I agree with that approach.

>>> @@ -1348,15 +1354,7 @@
>>>     if _cached_user_encoding is not None and use_cache:
>>>         return _cached_user_encoding
>>
>>> -    if sys.platform == 'darwin':
>>> -        # work around egregious python 2.4 bug
>>> -        sys.platform = 'posix'
>>> -        try:
>>> -            import locale
>>> -        finally:
>>> -            sys.platform = 'darwin'
>>> -    else:
>>> -        import locale
>>> +    import locale
>>
>>>     try:
>>>         user_encoding = locale.getpreferredencoding()
>>
>> So if a bzrlib client wants to use bzrlib.osutils.get_user_locale(),
>> they must do the darwin workaround themselves?  That doesn't seem nice.
>>  Is there any harm in continuing to do it here?
>
> Good point! I had added this in response to a comment from Martin Pool, but
> hadn't thought of the embedded case.

I'm not sure we're doing the calling application any favours by doing
this the first time they call bzrlib.osutils.get_user_encoding().  The
import statement of course will only actually execute the code in
locale.py the first time it's executed.  So if they happen to
(implicitly) load locale first they won't see this code, and many
applications might do something else that will cause that.

If we had fiddle_osx_locale in bzrlib/__init__.py it might be more reusable.

Possibly this code should just call it then adjust the output but that
might have other consequences.

>>> === modified file 'bzrlib/tests/__init__.py'
>>> --- bzrlib/tests/__init__.py    2008-06-11 07:54:54 +0000
>>> +++ bzrlib/tests/__init__.py    2008-06-21 16:37:
>>> @@ -3075,7 +3089,7 @@
>>
>>>     def _probe(self):
>>>         try:
>>> -            os.stat(u'\u03b1')
>>> +            os.stat(u'\u03b1\u283f')
>>>         except UnicodeEncodeError:
>>>             return False
>>>         except (IOError, OSError):
>>
>>
>> ^^^ Why was this change needed?
>
> Because with my tests in the ja_JP.EUC-JP locale, tests requiring a unicode
> filesystem were not skipped, because the alpha character used before can be
> expressed in EUC-JP as well. Just because one unicode character can be used
> in a file name, that doesn't mean that all can. So I added a braille pattern
> to that greek alpha, as I think it unlikely for any non-unicode encoding to
> contain them both.
>
> This fix is still contained in the patch attached here, but not in my
> setlocale.mini patch. As this fails
> test_msgeditor.MsgEditorTest.test__create_temp_file_with_commit_template_in_unicode_dir
> from the current bzr.dev as well in ja_JP.EUC-JP, I'd resubmit this again if
> you prefer smaller changesets or reject this approach here altogether.

I think this is reasonable but you should really explain in a comment
_why_ you're using this pattern rather than any other.  Just copy an
abridged form of this text.

>>> === modified file 'bzrlib/tests/blackbox/test_info.py'
>>> --- bzrlib/tests/blackbox/test_info.py  2007-12-15 18:33:10 +0000
>>> +++ bzrlib/tests/blackbox/test_info.py  2008-06-16 18:19:47 +0000
>>> @@ -114,7 +114,7 @@
>>>         self.assertEqual('', err)
>>
>>>         out, err = self.run_bzr('info branch --verbose')
>>> -        self.assertEqualDiff(
>>> +        self.assertEqualDiff((
>>>  """Standalone tree (format: weave)
>>>  Location:
>>>   branch root: branch
>>> @@ -154,7 +154,7 @@
>>>        # having the ui test dependent on the exact overhead of a given
>>> store.
>>>        branch2.repository._revision_store.total_size(
>>>         branch2.repository.get_transaction())[1] / 1024,
>>> -       ), out)
>>> +       )).encode(bzrlib.user_encoding), out)
>>>         self.assertEqual('', err)
>>
>>>         # Branch and bind to standalone, needs upgrade to metadir
>>
>> If you're fixing assertEqualDiff to handle encoding issues, is this
>> still necessary?
>
> Yes. The fix to assertEqualDiff only handles the output. What we have here
> is a comparison between a unicode string for the expected value and a byte
> string for the actual value. Those two will compare different unless the
> unicode string gets encoded as well.
>
> I found out I had to do similar adjustments for test_log. I guess I should
> have run the complete test suite one final time after fixing all the issues
> from previous runs, as It seems I had introduced new issues along the way.
> This time I did run it.

Maybe assertEqualDiff should actually do strict ascii conversion to
make all callers explicit.

I think that is a reasonable change but probably easier for people to
review if you put it in a separate diff next time.  Because of the
somewhat complicated behaviour of unicode strings in Python if we see
a bunch of loosely-related changes it's harder to be really sure what
was necessary.

-- 
Martin <http://launchpad.net/~mbp/>