OCR software for Ubuntu

NoOp glgxg at sbcglobal.net
Sat Jan 29 00:56:38 UTC 2011


On 01/28/2011 02:05 PM, Default User wrote:
> On Tue, Jan 11, 2011 at 18:39, Lucio M Nicolosi <lmnicolosi at gmail.com>wrote:
...
>> "Cuneiform is an OCR system. In addition to text recognition it also does
>> layout
>> analysis and text format recognition. Cuneiform supports several
>> languages."
...
> 
> Thanks for the suggestion.  Unfortunately, on Ubuntu 10.10 64-bit, after
> installation, I get an error message apparently related to another program
> added as a dependency.  So, it would not work, and I do not have time right
> now to roll up my sleeves and investigate possible "fixes".
> 

Installs w/o issue on my 10.10 64bit. What exactly is the error message
that you get?

$ sudo apt-get install cuneiform
[sudo] password for gg:
Reading package lists... Done
Building dependency tree
Reading state information... Done
...
The following extra packages will be installed:
  cuneiform-common
The following NEW packages will be installed:
  cuneiform cuneiform-common
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 28.3MB of archives.
After this operation, 56.9MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 http://archive.ubuntu.com/ubuntu/ maverick/multiverse
cuneiform-common all 0.7.0+dfsg.1-1 [26.4MB]
Get:2 http://archive.ubuntu.com/ubuntu/ maverick/multiverse cuneiform
amd64 0.7.0+dfsg.1-1 [1,916kB]
Fetched 28.3MB in 2min 12s (213kB/s)

Selecting previously deselected package cuneiform-common.
(Reading database ... 315168 files and directories currently installed.)
Unpacking cuneiform-common (from
.../cuneiform-common_0.7.0+dfsg.1-1_all.deb) ...
Selecting previously deselected package cuneiform.
Unpacking cuneiform (from .../cuneiform_0.7.0+dfsg.1-1_amd64.deb) ...
Processing triggers for man-db ...
Setting up cuneiform-common (0.7.0+dfsg.1-1) ...
Setting up cuneiform (0.7.0+dfsg.1-1) ...

$ cuneiform
Cuneiform for Linux 0.7.0
Usage: cuneiform[-l languagename -f format --dotmatrix --fax -o
result_file] imagefile

That's as far as I go as I haven't taken the time to learn how to use it :-)

You might find this useful:
https://help.ubuntu.com/community/OCR

You might try gocr:
http://manpages.ubuntu.com/manpages/maverick/man1/gocr.1.html
which is generally the default used in xsane etc.
<quote>
DESCRIPTION
       gocr  is an optical character recognition program that can be
used from the command line.  It takes input in PNM, PGM, PBM, PPM, or
PCX format, and  writes  recognized  text  to  stdout.  If the pnm file
is a single dash, PNM data is read from stdin.  If gzip, bzip2 and
netpbm-progs are installed  and your system supports popen(3) also
pnm.gz, pnm.bz2, png, jpg, jpeg, tiff, gif, bmp, ps (only single pages)
and eps are supported as input files (not as input stream), where pnm
can be replaced by one of ppm, pgm and pbm.
</quote>





More information about the ubuntu-users mailing list