trying to OCR a simple tif file with tesseract-ocr

Sat Mar 26 13:28:10 UTC 2011

It was a tif file, imageshack has converted it to png.

I can test your image in Fedora if you wish, it might be the same result.

On Sat, Mar 26, 2011 at 15:24, Robert P. J. Day <rpjday at crashcourse.ca> wrote:
> On Sat, 26 Mar 2011, Nicolae Ghimbovschi wrote:
>
>> In your case unpaper will not do much.
>>
>> I'm using Fedora, and tesseract 3.0 works just fine:
>>
>> Sample input:
>> http://img402.imageshack.us/f/eurotext.png/
>>
>> tesseract's output:
>> http://pastebin.com/zREhWGQq
>
>  in that case, i'm baffled, but i note that you seem to be using a
> .png file as input to tesseract, whereas the man page *strongly*
> suggests that tesseract works well only with TIFF files.  so that
> confuses me as well.
>
>  in any event, i'm still interested in someone trying this with a
> simple example under ubuntu and letting me know (privately if they
> wish) if they got it to work properly.  this really shouldn't be that
> difficult, i just don't see what i'm doing wrong.
>
> rday
>
> --
>
> ========================================================================
> Robert P. J. Day                               Waterloo, Ontario, CANADA
>                        http://crashcourse.ca
>
> Twitter:                                       http://twitter.com/rpjday
> LinkedIn:                               http://ca.linkedin.com/in/rpjday
> ========================================================================
>