Linux Archive

Linux Archive (
-   Ubuntu User (
-   -   SOLVED: Need advice: Ubuntu OCR techniques (

"Kevin O'Gorman" 10-10-2011 04:20 AM

SOLVED: Need advice: Ubuntu OCR techniques
On Sun, Oct 9, 2011 at 9:19 PM, Kevin O'Gorman <> wrote:

On Sun, Oct 9, 2011 at 6:34 PM, NoOp <> wrote:

On 10/09/2011 02:39 PM, Kevin O'Gorman wrote:

> On Sun, Oct 9, 2011 at 1:09 PM, Kevin O'Gorman <> wrote:


>> On Sun, Oct 9, 2011 at 11:10 AM, Icarus Alive <>wrote:


>>> On Sun, Oct 9, 2011 at 11:04 PM, Kevin O'Gorman <>

>>> wrote:

>>> > I'm new to OCR (optical character reading), have never done it before.

>>> > Suddenly I have a need.

>>> >

>>> > I've been diving through old papers and have found hard-copy (appears to

>>> be

>>> > real Courier font, laser printed on white background) of a program I

>>> wrote

>>> > decades ago on a Macintosh 512K in Lightspeed C. *I thought I had lost

>>> it

>>> > completely. *I would like to recover it from the hard-copy without

>>> typing

>>> > ~100 pages of code. *I have a scanner, and full Acrobat CS5 on a Windows

>>> > machine, plus all the FOSS of Ubuntu (tesseract, gocr, plus anything

>>> useful

>>> > in multiverse). *Does anybody know the fastest way to usable code from

>>> this

>>> > situation?


>>> Use the power-of-the-cloud... Google docs can do OCR. For english

>>> language printed text, scanned well, it works pretty well.



>>> Icarus (may your wings stay on),


>> Great idea. *I'll check it out.


>> I was unable to make it work. *I scanned one of the files as a 3-page TIFF

> file with Irfanview, and uploaded it to Google Docs. *I marked all the

> checkboxes for conversion, but did not get a text document. *I've marked it

> shared to all, and the link (for me) is


> (modulo any folding)



$ tesseract crystal.h1.tif crystal

Tesseract Open Source OCR Engine

Page 1

Page 2

$ gedit crystal.txt

not work for you?
Funny you should mention that.* I just installed tesseract after finding that gocr(1) could not deal with multipage TIF files.* It works about 99% other than whitespace, which still leaves a lot of proofreading and indenting.

On the subject of multipage TIF files, I created one this morning using Irfanview for the scanning, but have been unable to do that since then.* I've since started using the -append flag of convert(1) to build a document's worth of images.

Still, I wonder what I forgot to do with Irfanview.

Anyway, it appears I have a way to proceed, so this question is solved.* Thanks to all.
Kevin O'Gorman, PhD

Kevin O'Gorman, PhD

ubuntu-users mailing list
Modify settings or unsubscribe at:

All times are GMT. The time now is 01:27 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.