Mallard and Ubuntu-Translators and Translation Regressions

Thu Feb 25 18:11:52 UTC 2010

On Thu, 2010-02-25 at 12:56 -0500, Kyle Nitzsche wrote:
> Shaun McCance wrote:
> > So it occurs to me that somebody could write a tool that reads in
> > a DocBook-based PO file and converts the msgid and msgstr of 
> I could write that easily (and would be happy to do so) if the data and 
> rules (db tag > mallard tag) are known and clear. Then there's the 
> question of how and where it gets integrated into the transition process.

This should be a good start:

http://projectmallard.org/1.0/docbook.html

> > only
> > those messages which have tags from a certain well-known set.
> >
> > I just did a quick grep on the Gnome User Guide and found only 46
> > elements that appear in the PO files.  Of these, I'd say roughly
> > half could be reliably automatically converted.
> >   
> That list sounds like the start of the data & rules needed for 
> auto-conversion.

Here's the command I used:

grep -o '<[^/][^>/ ]*' es.po | grep -v '@' | sort -u

And this is what I got:

<anchor
<application
<citetitle
<command
<computeroutput
<email
<EMAIL
<emphasis
<filename
<firstterm
<glossterm
<guibutton
<guilabel
<guimenu
<guimenuitem
<guisubmenu
<imagedata
<imageobject
<indexterm
<interface
<itemizedlist
<keycap
<keycombo
<keysym
<link
<listitem
<literal
<mediaobject
<menuchoice
<option
<para
<phrase
<placeholder-1
<primary
<remark
<replaceable
<screenshot
<secondary
<see
<shortcut
<tertiary
<textobject
<ulink
<uri
<userinput
<xref

Note that <placeholder-$i/> elements are an artifact of xml2po.
Also note that PO files can split strings anywhere, so the above
command could give you partial tag names.  It just happened that
it didn't with the PO file I used.

(I have no idea what's up with the all-caps EMAIL element.)

> That still leaves half that would represent manual work for docs folks 
> to convert.
> > The utility of this depends, of course, on writers doing the most
> > obvious conversion of their content.  But even if the converted
> > messages don't match, merge tools will mark them as either fuzzy
> > or unused, so there's no harm in having them there.
> >   
> but then they are translation regressions, I think?

Sure, but no worse than if nothing had been done.  Short of a
fully automated conversion of the source documentation, I don't
think it will be possible to avoid translator work.  I'm just
trying to help minimize how much they have to do.

On the subject of a fully automated conversion, by the way, I
know it would be really nice to have a DocBook->Mallard tool,
and that is on my TODO list.  But a general tool is at best a
start for a conversion.

On the other hand, since Ubuntu has their own best practices for
writing topic-oriented help in DocBook already, it may be possible
to write a special-purpose tool that does exactly the conversion
the Ubuntu folks want.  In other words, a general tool needs to
make assumptions that might not be so for some people.  But with
a specialized tool, you know your assumptions.

And if it's possible to build a perfect XML converter, it should
be possible to write a perfect PO converter.  (Personally, I tend
to opt for 90% solutions.  Otherwise you sink more time into the
development than you end up saving.)

--
Shaun