i18n: infrustructure [long]
Sean Wheller
sean at inwords.co.za
Fri Apr 15 17:14:28 UTC 2005
Hello,
For all authors and translators. You may know that last release we did not
have much support of the i18n process. To address this I have restructured
the repos. While the restructure seems to have solved our issues related to
management and packaging introduced by i18n it does not solve a number of
sematic and processing issues.
One of the main problems was that in order to manage files better and simplify
the packaging system, we needed directories for each translation (prose and
images). Organizing in this method, by nature, resulted in changes to the
file paths and impacted on the location of an XML-instance in relation to the
images it references. As a result, the values specified for the fileref
attributes of all imagedata elements was broken (NULL) since the file no
longer resides in the same location as when we wrote the documents. We needed
to update all these values, but introduction of i18n support meant it was not
enough for us to just change the value to reflect the new path. Instead of
just having to specify the fileref values the English document and propagate
the value in i18n versions, we now have to consider that translated documents
have screen captures taken using the specific i18n locale.
While, at present, it is entirely possible for us to simply update the fileref
values for each document and its translations to reflect the new paths, this
would not be a good long term solution. First problem is that the translated
versions of an English document are generated through a process comprising
pot and po files. Each time we update a POT file the changes are merged into
the respective PO files that are finally reconstructed into XML-instances
based on the original XML-instance used to create the POT. Since the POT and
PO files do not contain all element data, the values for fileref attributes
would be propagated to all translated documents during this process and
result in us having to continually maintain fileref values.
Being a lazy person, I thought this was just too much overhead and could
easily lead to errors as things get forgotten or time runs out near release.
What we needed was a way to abstract, as best possible, so that the fileref
values propagated would work with little or no modification throughout the
work flow. My first thought was to script it in a make file or shell script,
but then I realized that doing so would not be of great benefit as it would
not result in solution that is easily supported by Document and Content
Management Systems. We needed a solution that was inline and maintainable
within a pure XML environment. After some investigation the following
solution was reached.
First we have modularized our entity structure to accommodate a number of
layers. I will not go into all now. To the internal subset of our document
prolog we have added an internal entity called 'language.' The value of
language is an entity reference to an entity defined in the external entity
called 'globalent' that defined and declared in the internal subset. The two
entities are shown below.
<!ENTITY % globalent SYSTEM "../../../libs/global.ent">
%globalent;
<!ENTITY % language "&EnglishAmerican;">
Within 'globalent' all the two letter, ISO language codes used in the i18n
process and the structure of our file system are defined as entities. For
example:
<!ENTITY German 'de' >
<!ENTITY Bhutani 'dz' >
<!ENTITY Greek 'el' >
<!-- <!ENTITY EnglishAmerican 'en'> -->
<!ENTITY EnglishAmerican 'C'>
<!ENTITY Esperanto 'eo' >
<!ENTITY Spanish 'es' >
<!ENTITY Estonian 'et' >
Hope you are still with me.
Looking at the language entity you will see that the value is
"&EnglishAmerican;" which expands to <!ENTITY EnglishAmerican 'C'> which
expands to "C". The current value of language is therefore "C", meaning it is
an English document. In the sample from 'globalent' you will notice that this
entity occurs twice except one instance is commented out. Look closer and you
will see that the value of the commented instance is not 'C' but 'en'. This
is the actual two letter ISO code, but we do not use it since within the
directory structure English documents and images are maintained in a
directory named 'C'. This is done to maintain compatability with a GNOME
convention that places all English resources in C.
Back to our entities. By setting the value of language to an entity reference
we have a single parameter by which to control the value of any parametized
entity references to 'language' throughout an XML-instance. Since the
'language' entity is not declared we can use a parameter entity to at any
point in the body of a document to substitue the value of language with the
value of the entity reference defined as its value.
For example:
<article id="art-about-ubuntu" status="complete" lang="%language;">
<title>....</title>
<para> ...... The language is %language;</para>
<mediaobject>
<imageobject>
<imagedata fileref="../../images/%language;/IconUbuntu.png" format="PNG"/>
</imageobject>
</mediaobject>
....
</article>
In the article node lang attribute %language; is used to denote the language
of the document. When the language attribute is matched by the Docbook XSLs
the value of 'lang' is used to select documents containing translated texts
called generated texts. They are part of the Docbook XSL package insalled on
your system. These include texts for labels, captions, etc.
Note: If a lang attribute is not defined the stylesheets default to en. This
is a small problem since if the value of 'language' is &EnglishAmerica; the
value of lang will be C. The stylesheets do not match lang="C" and a warning
is therefore generated
"No localization exists for "c" or "". Using default "en". null "
This is a warning by for our purpose has the desired effect of selecting the
en genetexts. It is a desired error.
In the para node the result is "The language is C", if the value of 'language'
is &EnglishAmerican;
In the imagedata node the result is
<imagedata fileref="../../images/C/IconUbuntu.png" format="PNG"/>
Substitute the value &EnglishAmerican; with &German; and the results will be:
<article id="art-about-ubuntu" status="complete" lang="de">
<para> ...... The language is de</para>
<imagedata fileref="../../images/de/IconUbuntu.png" format="PNG"/>
Hence we have:
1. a method to ensure that we can use the appropriate gentexts when
transforming to HTML and PDF etc.
2. a method to ensure compatability with yelp and GNOME folder conventions.
3. a method to ensure compatability under Document and Content Management
Systems.
All the above has one problem. We must somehow change the value of 'language'
for each XML-instance to the entity reference relevant to the language of the
document. I have to test if this can be done under a Document or Content
Management system, but am reasonably confident that it can be done using
pipes and XSLT.
Hope this helps.
--
Sean Wheller
Technical Author
sean at inwords.co.za
http://www.inwords.co.za
Registered Linux User #375355
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-doc/attachments/20050415/4ba75442/attachment.pgp>
More information about the ubuntu-doc
mailing list