OCR in Linux

Kenny Hitt kenny at hittsjunk.net
Mon Jan 11 05:49:07 GMT 2010

Hi.  I've been using ocropus for at least 2 years.  I built the first release from source, but have used Debian packages ever since.
Not sure why it isn't part of Lenny, but it is definitely part of Sid.

kenny at blackbox:~$ apt-cache search ocropus
ocropus - document analysis and OCR system
ocropus-data - document analysis and OCR system --- data files
kenny at blackbox:~$ apt-cache show ocropus
Package: ocropus
Status: install ok installed
Priority: optional
Section: graphics
Installed-Size: 3732
Maintainer: Jeffrey Ratcliffe <jeffrey.ratcliffe at gmail.com>
Architecture: i386
Version: 0.3.1-2
Depends: libc6 (>= 2.4), libgcc1 (>= 1:4.1.1), libiulib0, libjpeg62, liblua5.1-0, libpng12-0 (>= 1.2.13-4), libstdc++6 (>= 4.1.1), libtiff4, zlib1g (>= 1:1.1.4), ocropus-data (= 0.3.1-2)
Recommends: tesseract-ocr (>= 2.03-2)
Breaks: ocrodjvu (<< 0.3)
Description: document analysis and OCR system
 OCRopus(tm) is a state-of-the-art document analysis and Optical
 Character Recognition (OCR) system, featuring
 pluggable layout analysis, pluggable character recognition, statistical
 natural language modeling, and multi-lingual capabilities.
 The OCRopus engine is based on two research projects: a high-performance
 handwriting recognizer developed in the mid-90's and deployed by the US Census
 bureau, and novel high-performance layout analysis methods.
 OCRopus development is sponsored by Google and is initially intended for
 high-throughput, high-volume document conversion efforts. It
 will also be an excellent OCR system for many other applications.
Homepage: http://code.google.com/p/ocropus/

I knew you were talking about ocropus and not tesseract.


On Sun, Jan 10, 2010 at 10:56:50PM -0500, pmikeal at comcast.net wrote:
> > Hi.  It's part of Debian.  I am running Sid, but it's been a Debian package for a few years.
> Huh?  I thought ocropus just got created via google summer of code not 
> long ago.  You and others must have thought I meant tesseract which is 
> not the email I replied to because you are the second person who emailed 
> me something about it instead of ocropus to my query about ocropus here. 
> The other guy who emailed me I am even more sure meant tesseract because 
> he actually included a link to it.  I appreciate everyone's kindness to 
> answer questions, but please read the message people are replying to 
> before replying.  Thanks.
> -- 
> Ubuntu-accessibility mailing list
> Ubuntu-accessibility at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-accessibility

More information about the Ubuntu-accessibility mailing list