FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu User

 
 
LinkBack Thread Tools
 
Old 10-10-2011, 04:20 AM
"Kevin O'Gorman"
 
Default SOLVED: Need advice: Ubuntu OCR techniques

On Sun, Oct 9, 2011 at 9:19 PM, Kevin O'Gorman <kogorman@gmail.com> wrote:

On Sun, Oct 9, 2011 at 6:34 PM, NoOp <glgxg@sbcglobal.net> wrote:


On 10/09/2011 02:39 PM, Kevin O'Gorman wrote:

> On Sun, Oct 9, 2011 at 1:09 PM, Kevin O'Gorman <kogorman@gmail.com> wrote:

>

>> On Sun, Oct 9, 2011 at 11:10 AM, Icarus Alive <icarus.alive@gmail.com>wrote:

>>

>>> On Sun, Oct 9, 2011 at 11:04 PM, Kevin O'Gorman <kogorman@gmail.com>

>>> wrote:

>>> > I'm new to OCR (optical character reading), have never done it before.

>>> > Suddenly I have a need.

>>> >

>>> > I've been diving through old papers and have found hard-copy (appears to

>>> be

>>> > real Courier font, laser printed on white background) of a program I

>>> wrote

>>> > decades ago on a Macintosh 512K in Lightspeed C. *I thought I had lost

>>> it

>>> > completely. *I would like to recover it from the hard-copy without

>>> typing

>>> > ~100 pages of code. *I have a scanner, and full Acrobat CS5 on a Windows

>>> > machine, plus all the FOSS of Ubuntu (tesseract, gocr, plus anything

>>> useful

>>> > in multiverse). *Does anybody know the fastest way to usable code from

>>> this

>>> > situation?

>>>

>>> Use the power-of-the-cloud... Google docs can do OCR. For english

>>> language printed text, scanned well, it works pretty well.

>>> http://docs.google.com/support/bin/answer.py?answer=176692

>>>

>>> Icarus (may your wings stay on),

>>

>> Great idea. *I'll check it out.

>>

>> I was unable to make it work. *I scanned one of the files as a 3-page TIFF

> file with Irfanview, and uploaded it to Google Docs. *I marked all the

> checkboxes for conversion, but did not get a text document. *I've marked it

> shared to all, and the link (for me) is

> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B6pbHEZ ND52eZWNlZGQ4MmUtMTgwZi00MTQ3LWJkMTUtNzIzOTIwMWRlO WJk&hl=en_US



> (modulo any folding)

...



Does:

$ tesseract crystal.h1.tif crystal

Tesseract Open Source OCR Engine

Page 1

Page 2

$ gedit crystal.txt

not work for you?
Funny you should mention that.* I just installed tesseract after finding that gocr(1) could not deal with multipage TIF files.* It works about 99% other than whitespace, which still leaves a lot of proofreading and indenting.


On the subject of multipage TIF files, I created one this morning using Irfanview for the scanning, but have been unable to do that since then.* I've since started using the -append flag of convert(1) to build a document's worth of images.



Still, I wonder what I forgot to do with Irfanview.

Anyway, it appears I have a way to proceed, so this question is solved.* Thanks to all.
--
Kevin O'Gorman, PhD





--
Kevin O'Gorman, PhD


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 

Thread Tools




All times are GMT. The time now is 07:54 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org