gettext handling of LRO and RLO
scaife.chris at gmail.com
Sun Dec 26 12:41:15 GMT 2010
Entering data in a computer serves more purpose than simple having it
regurgitated to LOOK the same on the screen to a human reader: We would like
the computer to actually process it correctly and consistently.
Say a database holds the equivalent of name "Sproket123" but using an R2L
Then thanks to the Unicode bidi algorithm, on the screen a sequence of
characters appears to be "Sprocket321". I could also enter
"<RLO>Sprocket123<PDF>" or <LRO>321tekcorpS<PDF> to get that very same
appearance... and quite a few other combinations as well.
To HUMAN readers all of these different combinations are indistinguishable
and exactly the same thing, yet to the COMPUTER program algorithms they are
all completely different.
Software source code, be it in C or any other programming language. Is
mostly written by people in a Left to right tradition without ANY need to
embed Right to Left characters or any incentive to consider them in our
algorithms: a string is just a sequence of characters. Thus consideration of
how programming languages should handle directionality overrides is not very
OTOH it is of paramount importance in translation files such as the ones
submitted to gettext. IMO you really must not consider these two very
different file types as the same issue and placing the directionality
overrides inside the quotes is IMO the worst possible solution.
Anyway as far as I'm concerned I've solved that issue for my own project: It
now has the capability to handle R2L correctly under full control of the
translator but transparently to the person doing the programming.
My OWN technique for creating translation files consists of placing LRO or
RLO at the beginning of each line so that I know exactly what sequences of
characters I will be generating and then I remove them before submitting to
msgfmt. Other people can obviously use other tactics... Either way I'm
finally back onto my bidirectional terminal emulator project :)
> > While there are multiple ways to achieve the very same appearance on the
> screen, most programs not written with this in mind will consider text
> different embedded overrides in different places as completely different
> text... thus resulting in malfunction on things like a database lookup or
> even a simple string comparison.
I might need to ask you to explain that again, it could be the late hour
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ubuntu-RTL