Converting or OCRing PDF to text (solved)

Mon Jan 29 08:21:51 UTC 2007

pirmadienis 29 sausis 2007, john d. herron rašė:
> Thank you, Michael McIntyre and Donn.
> pdftotext works beautifully!
> jdh
>
> D. Michael McIntyre wrote:
> > On Sunday 28 January 2007 4:29 pm, Donn wrote:
> >>> I have a few articles in .pdf format I'd need to convert (or OCR) to
> >>> plain text, .odf or .doc format.
> >>> Any advice for this Linux newbie?
> >>
> >> Hi, check what these commands give you:
> >> pdftotext
> >> or
> >> pdftohtml
> >
> > Same thing I was going to suggest, more or less.  I use pdftohtml, then
> > load the HTML into OpenOffice and export it or save as or whatever to
> > convert it to OO.o-native format.  (I think you have to export it or send
> > it, so you don't wind up with an OO.o document that still behaves like a
> > one-page HTML file, but I can't remember the details, and I just closed
> > OO.o, and am too lazy to sit here while it warms back up.)

You can also try Kword, which can import pdf files, including the graphics.

It will suffice for the basic import, though it cannot handle more subtle 
things like tables.

-- 
Donatas Glodenis
http://dg.lapas.info