trying to OCR a simple tif file with tesseract-ocr
NoOp
glgxg at sbcglobal.net
Sat Mar 26 18:17:57 UTC 2011
On 03/26/2011 06:28 AM, Nicolae Ghimbovschi wrote:
> It was a tif file, imageshack has converted it to png.
Copied the png, converted to .tif with Gimp. First try I'd forgotten to
flatten & decompress:
$ tesseract ocrtest.tif testocr
Tesseract Open Source OCR Engine
check_legal_image_size:Error:Only 1,2,4,5,6,8 bpp are supported:32
Segmentation fault
Flattened & decomressed:
$ tesseract ocrtest2.tif testocr
Tesseract Open Source OCR Engine
$ ls *.txt
testocr.txt
And the result:
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from aspammer at website.com is spam.
Der ,,schnelle" braune Fuchs springt
uber den faulen Hund. Le renard brun
<<rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marron répido salta sobre el perro
perezoso. A raposa marrom répida
salta sobre o cio preguicoso.
$ apt-cache policy tesseract-ocr
tesseract-ocr:
Installed: 2.04-2
Candidate: 2.04-2
Version table:
*** 2.04-2 0
500 http://us.archive.ubuntu.com/ubuntu/ maverick/universe i386
Packages
100 /var/lib/dpkg/status
More information about the ubuntu-users
mailing list