trying to OCR a simple tif file with tesseract-ocr

Sat Mar 26 13:24:41 UTC 2011

On Sat, 26 Mar 2011, Nicolae Ghimbovschi wrote:

> In your case unpaper will not do much.
>
> I'm using Fedora, and tesseract 3.0 works just fine:
>
> Sample input:
> http://img402.imageshack.us/f/eurotext.png/
>
> tesseract's output:
> http://pastebin.com/zREhWGQq

  in that case, i'm baffled, but i note that you seem to be using a
.png file as input to tesseract, whereas the man page *strongly*
suggests that tesseract works well only with TIFF files.  so that
confuses me as well.

  in any event, i'm still interested in someone trying this with a
simple example under ubuntu and letting me know (privately if they
wish) if they got it to work properly.  this really shouldn't be that
difficult, i just don't see what i'm doing wrong.

rday

-- 

========================================================================
Robert P. J. Day                               Waterloo, Ontario, CANADA
                        http://crashcourse.ca

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================