SOLVED: Need advice: Ubuntu OCR techniques
On Sun, Oct 9, 2011 at 9:19 PM, Kevin O'Gorman <firstname.lastname@example.org> wrote:
On Sun, Oct 9, 2011 at 6:34 PM, NoOp <email@example.com> wrote:
On 10/09/2011 02:39 PM, Kevin O'Gorman wrote:
> On Sun, Oct 9, 2011 at 1:09 PM, Kevin O'Gorman <firstname.lastname@example.org> wrote:
>> On Sun, Oct 9, 2011 at 11:10 AM, Icarus Alive <email@example.com>wrote:
>>> On Sun, Oct 9, 2011 at 11:04 PM, Kevin O'Gorman <firstname.lastname@example.org>
>>> > I'm new to OCR (optical character reading), have never done it before.
>>> > Suddenly I have a need.
>>> > I've been diving through old papers and have found hard-copy (appears to
>>> > real Courier font, laser printed on white background) of a program I
>>> > decades ago on a Macintosh 512K in Lightspeed C. *I thought I had lost
>>> > completely. *I would like to recover it from the hard-copy without
>>> > ~100 pages of code. *I have a scanner, and full Acrobat CS5 on a Windows
>>> > machine, plus all the FOSS of Ubuntu (tesseract, gocr, plus anything
>>> > in multiverse). *Does anybody know the fastest way to usable code from
>>> > situation?
>>> Use the power-of-the-cloud... Google docs can do OCR. For english
>>> language printed text, scanned well, it works pretty well.
>>> Icarus (may your wings stay on),
>> Great idea. *I'll check it out.
>> I was unable to make it work. *I scanned one of the files as a 3-page TIFF
> file with Irfanview, and uploaded it to Google Docs. *I marked all the
> checkboxes for conversion, but did not get a text document. *I've marked it
> shared to all, and the link (for me) is
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B6pbHEZ ND52eZWNlZGQ4MmUtMTgwZi00MTQ3LWJkMTUtNzIzOTIwMWRlO WJk&hl=en_US
> (modulo any folding)
$ tesseract crystal.h1.tif crystal
Tesseract Open Source OCR Engine
$ gedit crystal.txt
not work for you?
Funny you should mention that.* I just installed tesseract after finding that gocr(1) could not deal with multipage TIF files.* It works about 99% other than whitespace, which still leaves a lot of proofreading and indenting.
On the subject of multipage TIF files, I created one this morning using Irfanview for the scanning, but have been unable to do that since then.* I've since started using the -append flag of convert(1) to build a document's worth of images.
Still, I wonder what I forgot to do with Irfanview.
Anyway, it appears I have a way to proceed, so this question is solved.* Thanks to all.
Kevin O'Gorman, PhD
Kevin O'Gorman, PhD
ubuntu-users mailing list
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
|All times are GMT. The time now is 06:37 PM.|
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.