pdftotext does not handle "fi"

NoOp glgxg at sbcglobal.net
Mon Jul 13 19:10:02 UTC 2009


On 07/13/2009 12:30 AM, Siggy Brentrup wrote:
> ATN: Mail-Followup-To: set, you should have asked this question on
>      ubuntu-users.
> 
> On Sun, Jul 12, 2009 at 19:23 -0400, Justin Hale wrote:
> 
>> Hello I'm using the latest version of Ubuntu Linux for my old Sony
>> Vaio and new Sony Vaio.  When I try to convert a large pdf file to
>> text I get the "fi" characters converted to one character instead of
>> two and the displayed results are inconsistent.
> 
> 'fi' as single character is a ligature; cf
> 
>   http://en.wikipedia.org/wiki/Typographic_ligature
> 
>> Sometimes the character looks like "fi" and sometimes it looks like
>> a diamond depending on whether or not I'm using GUI or the tty1,
>> respectively.  ~~ Justin
> 
> Note the box at the top right corner.
> 
> HTH
>   Siggy

Inkscape had some issues with ligatures going to pdf, maybe related?
https://bugs.launchpad.net/inkscape/+bug/385303
[Ligatures: "ff" and "fl" become "f" during PDF-Export]
https://bugs.launchpad.net/inkscape/+bug/218045
[Cairo PDF export cuts last characters from strings containing ligatures]







More information about the ubuntu-users mailing list