Semi-mechanizing the DTTP translations

Hannie Dumoleyn lafeber-dumoleyn2 at zonnet.nl
Thu Dec 27 08:28:34 UTC 2012


Hello Hendrik, Redmar, Pierre,
Redmar, thanks for writing the script.
The way I did the splitting so far is: open the sorted ddtp file in 
gedit, select lines 1 - 30.000 (which is about 940 Kb), copy these in a 
new document and save it. It only takes a few minutes. Then you can 
select the next 30.000 lines, and the next. Done!
Of course, using a script to split the whole file in one go is also very 
useful.
Hannie
Ubuntu Dutch Translators

Op 23-12-12 11:39, Hendrik Knackstedt schreef:
> Am 23.12.2012 10:33, schrieb Redmar:
>> Hendrik Knackstedt schreef op do 20-12-2012 om 17:39 [+0100]:
>>> Am 20.12.2012 13:43, schrieb Pierre Slamich:
>>>
>>>> I don't have a clean way to split them right now. I split them by
>>>> size to keep below 900ko (I took 800 for safety), but I then had to
>>>> adjust manually because the strings were split right in the middle.
>>> Ok, I'll take a look at it and see if I can come up with something
>>> useful.
>> I've been working with python-polib for a bit, so I think I'd be able to
>> create a script to split up a po file into multiple parts pretty
>> quickly. I haven't started yet, since I don't want to do duplicate work,
>> but please let me know if you want me to make a script or if you need
>> help with python-polib.
>
> If you can do this, that's great. Thanks!
>
> Hendrik
>> Regards,
>>
>> Redmar
>> --
>> Ubuntu Dutch Translators
>>>> If you don't mind, it would be great to take advantage of the German
>>>> process to automate the process as much as possible.
>>>> Would you be willing to expand the pad
>>>> (http://lite.framapad.org/p/ddtpUbuntu) with us (yet another proof
>>>> of French-German partnership ;-P)?
>>> Sure. What do you mean by "the German process"? I'm a bit short on
>>> time right now but just let me know what has to be done and I'll try
>>> to get it done asap.
>>>
>>> Regards,
>>> Hendrik
>>>> Pierre
>>>>
>>>> On Thu, Dec 20, 2012 at 1:35 PM, Hendrik Knackstedt
>>>> <hendrik.knackstedt at t-online.de>  wrote:
>>>>          Hey Pierre!
>>>>          
>>>>          
>>>>          I'd like to test your approach for the German language also.
>>>>          How exactly did you split the files? Did you use an existing
>>>>          program/script or can you provide a script for doing this?
>>>>          Thanks!
>>>>          
>>>>          Hendrik
>>>>          
>>>>          Am 19.12.2012 15:58, schrieb Pierre Slamich:
>>>>          
>>>>          > Yes, although we might be finished by then ;-)
>>>>          > Thanks to the method we're reviewing and correcting around
>>>>          > 1000 strings per day at the moment.
>>>>          >
>>>>          >
>>>>          > sincerely,
>>>>          > Pierre
>>>>          >
>>>>          >
>>>>          > On Tue, Dec 18, 2012 at 4:06 PM, Hannie Dumoleyn
>>>>          ><lafeber-dumoleyn2 at zonnet.nl>  wrote:
>>>>          >         Hi Pierre, Redmar, and all who are interested,
>>>>          >         Would it be an idea to brainstorm on this in
>>>>          >         #ubuntu-translators? Perhaps in January 2013?
>>>>          >         I agree with Redmar that the msgmerge is a good
>>>>          >         method, especially for huge documents. The only
>>>>          >         snag is that you still have to approve the fuzzies
>>>>          >         offline before uploading the file back to
>>>>          >         Launchpad. We use this method for the Ubuntu
>>>>          >         Manual "Getting started with Ubuntu" (Lucid >
>>>>          >         Maverick > ....> Raring) and with success.
>>>>          >         Redmar, sorry for not yet having tested your
>>>>          >         popsort :(
>>>>          >         Regards,
>>>>          >         Hannie
>>>>          >
>>>>          >         Op 18-12-12 00:51, Pierre Slamich schreef:
>>>>          >
>>>>          >         > Hi Hannie, Hi Redmar,
>>>>          >         > Thanks a lot for the tips: we're interested in
>>>>          >         > using your approach, and more generally it might
>>>>          >         > be interesting expending the msmerge approach to
>>>>          >         > all teams that are already underway for the
>>>>          >         > DDTP, and the Google one to the teams that need
>>>>          >         > to get started.
>>>>          >         >
>>>>          >         >
>>>>          >         > - For the Google Translator Kit approach, I
>>>>          >         > guess we could extend the mock project we did
>>>>          >         > for fr_FR to other languages (and streamlining
>>>>          >         > our process by using Bazaar) by creating a
>>>>          >         > global team responsible for the DDTP Mock
>>>>          >         > project and including in this team one member
>>>>          >         > from each language team responsible for
>>>>          >         > uploading the machine translated po for his or
>>>>          >         > her language.
>>>>          >         >
>>>>          >         >
>>>>          >         > - For the msmerge approach, do you already have
>>>>          >         > a project to handle this ? Is there any
>>>>          >         > advantage in msmerging raring against releases
>>>>          >         > older than quantal to get more modified
>>>>          >         > strings ? How many strings have you been able to
>>>>          >         > recover using that approach ?  It might be neat
>>>>          >         > to generate the msmerged po for all languages ?
>>>>          >         > Importing them as actual translations (not
>>>>          >         > fuzzy) into a mock project like the Google
>>>>          >         > Translate one would show them as suggestions for
>>>>          >         > the actual DDTP as well.
>>>>          >         > The translator would thus be able to pick the
>>>>          >         > human translated one when available or to build
>>>>          >         > on the machine translated one otherwise.
>>>>          >         >
>>>>          >         >
>>>>          >         > Can we try to schedule some time to coordinate
>>>>          >         > on this so that we can use both approaches and
>>>>          >         > try to onboard all the other languages teams
>>>>          >         > once we have a rock-solid process ?
>>>>          >         >
>>>>          >         >
>>>>          >         > Pierre
>>>>          >         >
>>>>          >         > Pierre Slamich
>>>>          >         >pierre.slamich at gmail.com
>>>>          >         >
>>>>          >         >
>>>>          >         >
>>>>          >         > On Mon, Dec 17, 2012 at 10:30 PM, Redmar
>>>>          >         ><redmar at ubuntu-nl.org>  wrote:
>>>>          >         >         Hi Pierre,
>>>>          >         >
>>>>          >         >         I've actually tried a similar approach
>>>>          >         >         for Dutch using msgmerge, which
>>>>          >         >         might also be worth checking out. When
>>>>          >         >         you merge the translations of an
>>>>          >         >         older version of ubuntu into the current
>>>>          >         >         version (msgmerge
>>>>          >         >         quantal_ddtp.po raring_ddtp.po -o
>>>>          >         >         merged_ddtp.po, for example), there
>>>>          >         >         will be a lot of 'fuzzy' translations
>>>>          >         >         for strings that are similar (for
>>>>          >         >         example, meta packages for different
>>>>          >         >         programs, debugging symbols etc).
>>>>          >         >         These fuzzy often only need a few small
>>>>          >         >         changes (eg program name) to be
>>>>          >         >         accepted, which can really speed up
>>>>          >         >         translations. And you don't have to
>>>>          >         >         worry about google putting in a weird
>>>>          >         >         translation, since it is all based
>>>>          >         >         on earlier translations done by a human.
>>>>          >         >
>>>>          >         >         On a related note, if any of you work on
>>>>          >         >         ddtp-translations offline, I
>>>>          >         >         have written a python program that can
>>>>          >         >         sort entries in ddtp po-files
>>>>          >         >         based on the popularity of the package.
>>>>          >         >         This way, the most popular
>>>>          >         >         packages will be at the top of the po
>>>>          >         >         file, and you are always sure you
>>>>          >         >         are working on the most important
>>>>          >         >         packages first.
>>>>          >         >
>>>>          >         >         You can get the code here:
>>>>          >         >         bzr branch lp:~redmar/+junk/ddtp_popsort
>>>>          >         >
>>>>          >         >         It has a small readme file, please let
>>>>          >         >         me know if something is unclear
>>>>          >         >         or not working for you.
>>>>          >         >
>>>>          >         >         Regards,
>>>>          >         >         Redmar
>>>>          >         >         --
>>>>          >         >         Ubuntu Dutch Translators
>>>>          >         >
>>>>          >         >
>>>>          >         >         Hannie Dumoleyn schreef op ma 17-12-2012
>>>>          >         >         om 17:58 [+0100]:
>>>>          >         >         > Hello Pierre,
>>>>          >         >         > This is a very good idea! I have just
>>>>          >         >         uploaded the first part of the
>>>>          >         >         > incomplete Dutch translation (900kb)
>>>>          >         >         to GTT.
>>>>          >         >         > Thanks,
>>>>          >         >         > Hannie
>>>>          >         >         >
>>>>          >         >         > Op 17-12-12 12:55, Pierre Slamich
>>>>          >         >         schreef:
>>>>          >         >         >
>>>>          >         >         > > The DDTP represent around 50 000
>>>>          >         >         strings to translate * 140
>>>>          >         >         > > languages. On very good weeks, a
>>>>          >         >         typical translation team translates
>>>>          >         >         > > 500 strings (see UWN for examples
>>>>          >         >         weekly figures).
>>>>          >         >         > >
>>>>          >         >         > >
>>>>          >         >         > > Would take a lot of weeks (years?)
>>>>          >         >         with highly motivated volunteers
>>>>          >         >         > > of a large translation team, working
>>>>          >         >         non-stop, at their best to get
>>>>          >         >         > > done with it.
>>>>          >         >         > > Thus we had the idea to delegate
>>>>          >         >         initial translation suggestions to
>>>>          >         >         > > Google Translator Kit and review
>>>>          >         >         translations with humans to speed
>>>>          >         >         > > the process.
>>>>          >         >         > >
>>>>          >         >         > > We successfully did an import for
>>>>          >         >         circa 40 000 French strings  (yup
>>>>          >         >         > > you read that right) this week-end
>>>>          >         >         in a mock project called DDTP
>>>>          >         >         > > Automation
>>>>          >         >         (https://translations.launchpad.net/ddtpautomation).
>>>>          >         >         > > To keep it short, the translations
>>>>          >         >         from this project appear as
>>>>          >         >         > > suggestions in the French DDTP, and
>>>>          >         >         can be reviewed by actual
>>>>          >         >         > > translators.
>>>>          >         >         > > We've started using them, and it
>>>>          >         >         turns out that a lot of them are
>>>>          >         >         > > actually useful and are speeding up
>>>>          >         >         the translation process a lot.
>>>>          >         >         > >
>>>>          >         >         > > We detailed the (somewhat) tedious
>>>>          >         >         process in English at
>>>>          >         >         > >
>>>>          >         >http://lite.framapad.org/p/ddtpUbuntu
>>>>          >         >         > > Questions and inquiries welcome.
>>>>          >         >         > >
>>>>          >         >         > > Pierre
>>>>          >         >         > >
>>>>          >         >         > >
>>>>          >         >         > > ---
>>>>          >         >         > >pierre.slamich at gmail.com
>>>>          >         >         > >
>>>>          >         >         > >
>>>>          >         >         >
>>>>          >         >
>>>>          >         >
>>>>          >         >
>>>>          >         >         --
>>>>          >         >         ubuntu-translators mailing list
>>>>          >         >ubuntu-translators at lists.ubuntu.com
>>>>          >         >https://lists.ubuntu.com/mailman/listinfo/ubuntu-translators
>>>>          >         >
>>>>          >         >
>>>>          >         >
>>>>          >
>>>>          >
>>>>          >
>>>>          >
>>>>          >
>>>>          >
>>>>          
>>>>          
>>>>          
>>>>          --
>>>>          ubuntu-translators mailing list
>>>>          ubuntu-translators at lists.ubuntu.com
>>>>          https://lists.ubuntu.com/mailman/listinfo/ubuntu-translators
>>>>          
>>>>
>>>>
>>
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-translators/attachments/20121227/3541610e/attachment.html>


More information about the ubuntu-translators mailing list