[ubuntu-uk] OCR ....

Barry Drake bdrake at crosswire.org
Mon Dec 6 16:55:32 GMT 2010

On Mon, 2010-12-06 at 16:07 +0000, Simon Greenwood wrote:

> I had a need to do some OCR recently and came across a project called
> tesseract-ocr: http://code.google.com/p/tesseract-ocr/. It's based on
> HP code that dates from the mid-90s. I've only used it to extract text
> from existing graphics but it seems to be very accurate.

You're right - it is accurate - and it works with the neat gui frontend
that Danté mentioned - gscan2pdf. Makes a fantastic combination that's
amazingly easy to use.  Tesseract and gscan2pdf really ought to get into
the normal Ubuntu release .... or at least be well promoted in the
'Software Centre' and Synaptic so they are easy to find.  The only one
that's really easy to find is gocr, and so far I'm not that impressed.

Thank you both.  This will save me a lot of time in the future.  It will
also save me having to say to my daughter or my sister 'Well, I've got
this Windows program .....  "

Barry Drake.

Sent from my desktop using Ubuntu - the window-free environment
that gives me real fresh air.

More information about the ubuntu-uk mailing list