searching accented characters in LaTeX-generated PDF files may not work

Martin Konopka martin.konopka at stuba.sk
Fri Sep 18 13:53:11 UTC 2009


Greetings,

(Skip most of this email and just read the === marked lines below to
quickly get the essence.)

I use LaTeX to create PDF documents (either using pdflatex or dvipdfmx
or dvipdfm or dvipdf). Under a typical texlive installation, the PDF
files do not allow for searching words with some accented characters
like

č Č š Š

i.e. those typical for Central Europe. (I live in Slovak Republic, i.e.
I often write documents in the Slovak language.)

I stress that I looked very carefully at all packages listed under latex
and texlive in the Synaptic Package Manager and I installed everything
which looked to be useful for me. However PDF documents generated on my
Ubuntu Jaunty Jackalope (64-bit) system were unusable what regards to
searching (in Evince or Okular) for words containing the above accennted
characters.

Then I decided to install the metapackage texlive-full and it started to
generate searchable files! (I tested dvipdf and dvipdfmx).

Nice, but the installation of texlive-full implies that I have now
support for many many languages which I obviously do not need. (And some
hundreds of megabytes less disk space.)

Please note that many users may think that the support for searching the
special characters is not present in Ubuntu at all.

=================================================================
In order to save users (and mainly readers!) from troubles I suggest at
least to make it clear from the short descriptions of the packages which
package is required to  be able to generate searchable PDF documents.
=================================================================

(I even think that nowadays searchable PDF text files are something so
much needed that support to create them should be included automatically
whenever possible.)

Below I put my short test file as an example.

\documentclass[12pt,a4paper,oneside]{report}

\usepackage[slovak]{babel}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

\begin{document}
Some of the accented characters used in the Slovak language:

č Č ď Ď ň Ň ĺ Ĺ ľ Ľ ô Ô ŕ Ŕ š Š ť Ť ž Ž
\end{document}

I have not tested yet latest development versions of ubuntu but I guess
that this problem would quite likely be the same as on 9.04.

Thanks for your work.
Martin Konôpka.






More information about the Ubuntu-devel-discuss mailing list