OCR in Linux
kenny at hittsjunk.net
Mon Jan 11 05:49:07 GMT 2010
Hi. I've been using ocropus for at least 2 years. I built the first release from source, but have used Debian packages ever since.
Not sure why it isn't part of Lenny, but it is definitely part of Sid.
kenny at blackbox:~$ apt-cache search ocropus
ocropus - document analysis and OCR system
ocropus-data - document analysis and OCR system --- data files
kenny at blackbox:~$ apt-cache show ocropus
Status: install ok installed
Maintainer: Jeffrey Ratcliffe <jeffrey.ratcliffe at gmail.com>
Depends: libc6 (>= 2.4), libgcc1 (>= 1:4.1.1), libiulib0, libjpeg62, liblua5.1-0, libpng12-0 (>= 1.2.13-4), libstdc++6 (>= 4.1.1), libtiff4, zlib1g (>= 1:1.1.4), ocropus-data (= 0.3.1-2)
Recommends: tesseract-ocr (>= 2.03-2)
Breaks: ocrodjvu (<< 0.3)
Description: document analysis and OCR system
OCRopus(tm) is a state-of-the-art document analysis and Optical
Character Recognition (OCR) system, featuring
pluggable layout analysis, pluggable character recognition, statistical
natural language modeling, and multi-lingual capabilities.
The OCRopus engine is based on two research projects: a high-performance
handwriting recognizer developed in the mid-90's and deployed by the US Census
bureau, and novel high-performance layout analysis methods.
OCRopus development is sponsored by Google and is initially intended for
high-throughput, high-volume document conversion efforts. It
will also be an excellent OCR system for many other applications.
I knew you were talking about ocropus and not tesseract.
On Sun, Jan 10, 2010 at 10:56:50PM -0500, pmikeal at comcast.net wrote:
> > Hi. It's part of Debian. I am running Sid, but it's been a Debian package for a few years.
> Huh? I thought ocropus just got created via google summer of code not
> long ago. You and others must have thought I meant tesseract which is
> not the email I replied to because you are the second person who emailed
> me something about it instead of ocropus to my query about ocropus here.
> The other guy who emailed me I am even more sure meant tesseract because
> he actually included a link to it. I appreciate everyone's kindness to
> answer questions, but please read the message people are replying to
> before replying. Thanks.
> Ubuntu-accessibility mailing list
> Ubuntu-accessibility at lists.ubuntu.com
More information about the Ubuntu-accessibility