[RFC] I18N specification draft

Alexander Belchenko bialix at ukr.net
Tue May 8 11:04:02 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool пишет:
>> Please take a look on my draft:
>> http://bazaar-vcs.org/DraftSpecs/I18nSupport
> 
> That looks great.
> 
> I don't know much about the translation side of it but I have some
> thoughts on the code changes.  I've cc'd Carlos, one of the authors of
> Rosetta, and maybe he can check your approach too from that aspect.
> 
>> help for commands
> 
> .. and for other help topics too.

Of course, added to draft.

> 
>> One BIG issue for bzrlib -- it's usage docstring as help for commands.
> 
> I have no objection to changing this.  We have already changed most if
> not all exception classes.

Great! So we can finally drop limitation of 'python -OO'.

>> Internationalization of bzrlib should be transparent for others clients.
> 
> Meaning that bzrlib will use whatever locale is globally set without
> any special action by the client program?

No. Function _() is not defined by default. So when client import bzrlib
it should itself install some _() function, otherwise when it reach
string _('Foo') bzrlib raise NameError exception.

I think we can solve it very easily:

1) Drop backward compatibility with older bzrlib versions without i18n,
and nag all clients of bzrlib install _() themself before import bzrlib

Or:

2) Add simple check at the top of bzrlib/__init__.py:

if getattr(__builtins__, "_", None) is None:
    setattr(__builtins__, "_", lambda x: x)

Or something similar with try-except block.

> 
>> 4. Setup wrappers for stdout and stderr before emit localized messages
>> to avoid UnicodeEncodeError when output of bzr is redirected
> 
> I imagine this refers to overriding the encoding if it's set to ascii
> but the user's language can't be represented in ascii?  How will we
> know what encoding to use?

No. Without explicit setup wrappers around stdout and stderr bzr
will raise UnicodeEncodeError when output will be redirected.
Because when output is redirected sys.stdout object has no
'encoding' attribute. And sys.stderr object don't have it in
any case (even without redirecting). Without 'encoding' attribute
python will try to encode our translated messages (that will be always
unicode!) with default 'ascii' encoding and therefore fails.
That's why we explicitly setup outf stream in Command instance.

>> 5. Explicitly constructs error messages for errors based on standard
>> Python's exceptions IOError, OSError. This task will require to
>> reimplement all error messages that bzrlib handles in our codebase,
>> because we need to have them in POT for translations, and because we
>> need fine-grained control over constructing resulting localized error
>> message
> 
> Can you please explain more about this point?  I thought that the
> messages for python exceptions and os errors were already localized in
> non-english locales?

May be on Linux, I don't check. But on Windows -- NO.
In Python 2.5 some WidowsErrors try to emit localized messages,
but this messages is not in unicode, but in user_encoding instead,
so they almost unreadable in default console. User should
explicitly switch encoding of console with 'chcp' command.
Per example, on my russian Windows I have default console
encoding is 'cp866' (OEM encoding), and should run every time
'chcp 1251' to switch to ANSI encoding.


> To me the largest code issue not addressed by this is what to do with
> the test suite.  Some ui-oriented tests want to assert about the
> strings and need to be run in a known locale.  On the other hand we
> probably want to test that the behaviour of most of the
> non-user-facing parts are consistent regardless of the locale, or to
> test them in any given locale.  The best course may be to first run
> everything in a english utf-8 locale, then add some ui tests that
> specifically work in other languages, then switch some non-ui-based
> tests back to running in the environment's default locale.

I'm sure we should run our test suite without importing gettext.
I think we should import gettext and install _() function from gettext
in Command class, and skip this step for selftest command.
So we always deal with C locale in test suite. Otherwise we will
have *real* nightmare.

> In fact there is a recent bug about test isolation from locales.
> 
>  https://bugs.launchpad.net/bzr/+bug/111377
> 
> so we need to do this even now.

> 
> Another code area to consider is the smart server.  We want errors
> that originate on the server to be displayed in the right language.
> That seems to imply either the client tells the server what language
> it wants, or we always pass language-neutral messages (eg exception
> names plus parameters, or template strings plus parameters.)

Smart server should send english messages, and client will convert
it to localized form simply by calling _() function:

locale_msg = _(english_message_from_server)

There is some tricks described in python documentation with redefining
of _() function locally to have access to original english string
instead of translated one. It's easily doable. Actually I use such trick
in my bzr-config utility, because I use some strings for 2 purposes:

* english veriant used as dict keys,
* localized variant used in GUI

> 
> I've heard in the past some systems have trouble with programs
> changing locales while they run, will this be a problem for us in
> testing?

I think we should run our tests without importing gettext at all,
ao we should skip this problem at all.

> Aside from that, it looks good and I'd be happy to proceed along these
> lines.

OK. My first plan is to create minimal infrastructure to produce
PO template, and then set template in Rosetta. Because initial import
takes very long time we could even import bzr.pot with 2-3 messages
and then update it regularly (update is much faster operation).

[µ]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGQEsSzYr338mxwCURArBnAJwOV1eCpb/8RSfKcJv4e1/mhdI/NwCfTGjy
UjddnvMFVK4r1NlA0qn7qMU=
=GiCt
-----END PGP SIGNATURE-----



More information about the bazaar mailing list