Need advice: Ubuntu OCR techniques

Mon Oct 10 01:34:12 UTC 2011

On 10/09/2011 02:39 PM, Kevin O'Gorman wrote:
> On Sun, Oct 9, 2011 at 1:09 PM, Kevin O'Gorman <kogorman at gmail.com> wrote:
> 
>> On Sun, Oct 9, 2011 at 11:10 AM, Icarus Alive <icarus.alive at gmail.com>wrote:
>>
>>> On Sun, Oct 9, 2011 at 11:04 PM, Kevin O'Gorman <kogorman at gmail.com>
>>> wrote:
>>> > I'm new to OCR (optical character reading), have never done it before.
>>> > Suddenly I have a need.
>>> >
>>> > I've been diving through old papers and have found hard-copy (appears to
>>> be
>>> > real Courier font, laser printed on white background) of a program I
>>> wrote
>>> > decades ago on a Macintosh 512K in Lightspeed C.  I thought I had lost
>>> it
>>> > completely.  I would like to recover it from the hard-copy without
>>> typing
>>> > ~100 pages of code.  I have a scanner, and full Acrobat CS5 on a Windows
>>> > machine, plus all the FOSS of Ubuntu (tesseract, gocr, plus anything
>>> useful
>>> > in multiverse).  Does anybody know the fastest way to usable code from
>>> this
>>> > situation?
>>>
>>> Use the power-of-the-cloud... Google docs can do OCR. For english
>>> language printed text, scanned well, it works pretty well.
>>> http://docs.google.com/support/bin/answer.py?answer=176692
>>>
>>> Icarus (may your wings stay on),
>>
>> Great idea.  I'll check it out.
>>
>> I was unable to make it work.  I scanned one of the files as a 3-page TIFF
> file with Irfanview, and uploaded it to Google Docs.  I marked all the
> checkboxes for conversion, but did not get a text document.  I've marked it
> shared to all, and the link (for me) is
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B6pbHEZND52eZWNlZGQ4MmUtMTgwZi00MTQ3LWJkMTUtNzIzOTIwMWRlOWJk&hl=en_US
> (modulo any folding)
...

Does:
$ tesseract crystal.h1.tif crystal
Tesseract Open Source OCR Engine
Page 1
Page 2
$ gedit crystal.txt
not work for you?