pdftotext does not handle "fi"
NoOp
glgxg at sbcglobal.net
Mon Jul 13 19:10:02 UTC 2009
On 07/13/2009 12:30 AM, Siggy Brentrup wrote:
> ATN: Mail-Followup-To: set, you should have asked this question on
> ubuntu-users.
>
> On Sun, Jul 12, 2009 at 19:23 -0400, Justin Hale wrote:
>
>> Hello I'm using the latest version of Ubuntu Linux for my old Sony
>> Vaio and new Sony Vaio. When I try to convert a large pdf file to
>> text I get the "fi" characters converted to one character instead of
>> two and the displayed results are inconsistent.
>
> 'fi' as single character is a ligature; cf
>
> http://en.wikipedia.org/wiki/Typographic_ligature
>
>> Sometimes the character looks like "fi" and sometimes it looks like
>> a diamond depending on whether or not I'm using GUI or the tty1,
>> respectively. ~~ Justin
>
> Note the box at the top right corner.
>
> HTH
> Siggy
Inkscape had some issues with ligatures going to pdf, maybe related?
https://bugs.launchpad.net/inkscape/+bug/385303
[Ligatures: "ff" and "fl" become "f" during PDF-Export]
https://bugs.launchpad.net/inkscape/+bug/218045
[Cairo PDF export cuts last characters from strings containing ligatures]
More information about the ubuntu-users
mailing list