sorting translation entries

Mon Jan 31 08:56:46 CST 2005

Ar 22/01/2005 am 02:01, ysgrifennodd Valient Gough:
> Dafydd Harries wrote:
> >Ar 02/01/2005 am 12:52, ysgrifennodd Valient Gough:
> > 
> >>I've been sorting my translation strings (POT file) in rough order by 
> >>most desired translations first, so translators will see them first in 
> >>Rosetta.  I don't know how Rosetta plans on dealing with sorting (if at 
> >>all), so I'll describe how I've handled it for my project.
> >>   
> >
> >We've thought about adding support for guessing about how difficult a
> >string is to translate, based on things such as how long it is, whether
> >it has plural forms, whether it has variable substitutions etc.
> >
> >The main disadvantage to this is that related messages are often (though
> >not always) near each other in the .pot file, and seeing related
> >messages together often helps in having a consisitent translation.
> > 
> I see -- it is probably a good assumption that translations are more 
> helpful when dealing with more complex sentences..
> 
> But I want to sort based on most-frequently-displayed -- so the strings 
> which are displayed most often get priority for translation.  My typical 
> application has lots of common strings, along with strings which are 
> only seen during setup or usage message, and then strings which are only 
> seen if something very strange has happened (warning messages, debug 
> messages).

Right, this is the sort of information that you can only add manually.

> Those warning messages, when the program detects an unexpected state, 
> may be very verbose to try and provide lots of information for 
> debugging, but that doesn't mean they are necessarily the best to 
> translate because I expect that if my program is working well that 
> nobody will every see the strings at all..

There's a number of approaches which you can take with this sort of
message.

 - Don't make debugging information translatable. This makes sense if
   the information is likely to be useful only to a developer of the
   software and not to a user.

 - Split out less important messages into a separate translation domain.
   For example, GTK+ does this with descriptions for widget properties
   for use by Glade. These generally don't appear in applications which
   use GTK+, and so they are less important for translation.

> I have a rough ordering of tags right now based on such frequency 
> groupings.  I don't mind if the tags are re-ordered within a group, but 
> I don't want to drop my ordering for an automated grouping from an 
> algorithm that knows nothing about my program.

This is another potential approach -- to group messages together using
some form of metadata in the PO template, probably in the comments.

> If you use an algorithm to sort entries, then you optimize for the 
> average or mean case.  An individual can do a better job on any 
> particular case, so I think the goal should be to either enable directed 
> grouping (within user-specified subgroups) or as a bootstrap for 
> otherwise unsorted applications (but don't override sorting provided 
> later by the user).

Yes, I think this is a case where human judgement will be better than
heuristics. But we should not give up on the idea of using a heuristic
when grouping data is missing.

-- 
Dafydd