[ubuntu-us-nc] PDF conversion

Dewey Hylton plug at hyltown.com
Tue Jun 22 16:33:16 BST 2010


----- Original Message -----
From: "Jeff Lane" <jeffrey.lane at canonical.com>
To: "Ubuntu North Carolina Local Community Team" <ubuntu-us-nc at lists.ubuntu.com>
Sent: Tuesday, June 22, 2010 11:06:36 AM GMT -05:00 US/Canada Eastern
Subject: Re: [ubuntu-us-nc] PDF conversion

On Tue, 2010-06-22 at 00:05 -0400, J Mark Cox wrote:
> On Mon, 2010-06-21 at 19:42 -0400, Jeff Lane wrote:
> > Instead of all these fancy pointy-clickey ways, try on one of the tools
> > from poppler-utils (should be installed by default in 10.04, or at least
> > I don't remember ever installing them).
> > 
> > For example:
> > 
> > pdftotext - converts pdf files to text files
> > pdftohtml - converts pdf files to HTML files
> > pdftops - converts pdf to PostScript
> > pdftoabw - converts pdf to AbiWord format
> > 
> > http://poppler.freedesktop.org/
> > 
> > And it's all shell, so you can script it to run against all the PDFs you
> > have...
> > 
> > 
> Awesome! Now I have to go edit those tax form pdf files I have been
> wanting  to "slightly" modify. Oh my, coffee...
> 
> Make checks payable to:
> Send checks to: 
> 
> Just kidding, but looks like some intriguing possibilities none the
> less.

Yeah, they are neat little tools.  I've had them for a while and use
them on occasion, but I don't know if they're part of the default Lucid
install or not.  I always thought they were part of Xpdf, until the
other day, to be honest :)  I never installed poppler myself, so I am
left to guess that it was either a default package, or was a dependency
of something else I installed.

In any case, they work pretty well, though not always.  Apparently there
are some PDFs that have mangled data or werid fonts or characters that
don't always get converted properly, so it's still a good idea to
actually look over the converted files before pushing them anywhere
important... just like when using OCR...

some pdfs are just glorified bitmap images; those of course wouldn't transform very easily into usable text.



More information about the ubuntu-us-nc mailing list