trying to OCR a simple tif file with tesseract-ocr

Sat Mar 26 13:17:19 UTC 2011

In your case unpaper will not do much.

I'm using Fedora, and tesseract 3.0 works just fine:

Sample input:
http://img402.imageshack.us/f/eurotext.png/

tesseract's output:
http://pastebin.com/zREhWGQq

On Sat, Mar 26, 2011 at 15:03, Robert P. J. Day <rpjday at crashcourse.ca> wrote:
>
>  a little more googling suggests that i'm not the only person who's
> run into this issue:
>
> http://ubuntuforums.org/showthread.php?t=1599686
>
> the symptoms described there are *exactly* what i'm seeing -- the
> output file consisting of a single byte.  so can anyone else try a
> simple tesseract invocation on a trivial .tif file and verify whether
> or not they get actual output?
>
> rday
>
> --
>
> ========================================================================
> Robert P. J. Day                               Waterloo, Ontario, CANADA
>                        http://crashcourse.ca
>
> Twitter:                                       http://twitter.com/rpjday
> LinkedIn:                               http://ca.linkedin.com/in/rpjday
> ========================================================================
>