LTR and text wrapping (Re: Internationalisation of bzr cli

Eli Zaretskii eliz at gnu.org
Tue May 3 04:44:43 UTC 2011


> From: INADA Naoki <songofacandy at gmail.com>
> Date: Tue, 3 May 2011 12:00:32 +0900
> Cc: Andrew Bennetts <andrew.bennetts at canonical.com>, bazaar at lists.canonical.com
> 
> I'll implement textwrapper that take care of CJK width (one character having
> double width).

Thanks.

> I want to support bidi but I don't know about bidi much.

You can learn the basics here:

  http://unicode.org/reports/tr9/

Unless you intend to implement the full bidirectional reordering
algorithm described there (which I don't think you need, see below),
you can disregard most of what the above says, and just read the
definitions and the general guidelines, to catch the spirit.  There's
also a pointer there to a "reference implementation", which you can
compile and play with.

> When I have string "foo bar foobar FOO BAR FOOO",
> It is shown as "OOOF RAB OOF raboof rab oof", right?

No.  First, the visual appearance depends on whether the text will be
predominantly right-to-left or left-to-right.  In the former case, the
visual appearance will be

                                           OOF RAB OOOF foo bar foobar

(flushed all the way to the right margin of the display area).  In the
latter case, it will look like this:

 foo bar foobar OOF RAB OOOF

IOW, the parts that use left-to-right characters are displayed as they
normally would, while the right-to-left characters are reversed.

And this is just the simplest case, with no explicit embeddings and no
special control characters that affect reordering.  The Unicode Annex
at the above URL has all the details.

However, you don't need to implement all this complexity in bzr or in
your wrapper; that'd be madness, IMO.  There are already terminals and
text widgets that support bidi reordering, and you should rely on them
to do the job.  Or we could decide that the translated strings will be
already in the visual order, then the translator will need to take
care of that.  In the latter case, the text should already be wrapped
correctly, which doesn't allow dynamic wrap.

> Then, when I wrap it with 8 chars, it should be shown as
> >rab oof
> >F raboof
> >F RAB OO
> >OOO
> right?

No.  Each line is reordered separately.  For example:

 foo bar
 F foobar
 F RAB OO
 OOO

(And it is better not to break the line in the middle of a word, in
any language; with bidi it is even worse than in English.)

Note that I didn't mean to bother everyone here with this complexity.
All I said is that we should decide early on whether the translations
to right-to-left script will be in logical order, and we rely on
reordering at display time by the terminal or the text widget that is
used to display the strings; or the translations are in visual order
and are displayed without any reordering.  Also, if the strings to be
translated include format conversion specifiers, they need to be able
to support change of order in the use of arguments (because the
literal text will be reordered), like the %n$ thing in C printf
formats.



More information about the bazaar mailing list