FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 07-14-2008, 06:00 AM
Bret Busby
 
Default Query about OCR package(s)

hello.

Is an OCR package available for Debian 4.0, in .deb form, that can read
from PDF files, to allow text to be extracted from PDF files?


In looking at what is available in Synaptic, I could not find such a
package.


Thank you in anticipation.

--
Bret Busby
Armadale
West Australia
..............

"So once you do know what the question actually is,
you'll know what the answer means."
- Deep Thought,
Chapter 28 of Book 1 of
"The Hitchhiker's Guide to the Galaxy:
A Trilogy In Four Parts",
written by Douglas Adams,
published by Pan Books, 1992

.................................................. ..


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 07-14-2008, 03:08 PM
Osamu Aoki
 
Default Query about OCR package(s)

Hi,

On Mon, Jul 14, 2008 at 02:00:06PM +0800, Bret Busby wrote:
>
> hello.
>
> Is an OCR package available for Debian 4.0, in .deb form, that can read
> from PDF files, to allow text to be extracted from PDF files?

Yes ... From PDF? I do not know but implimenting it is simple with some
filtering.

Questio is more on quality or correctness of OCR result. So far, I have
mixed result of very bad and fairly usable. None was very good.

I think its database needs to be trained.

> In looking at what is available in Synaptic, I could not find such a
> package.

Use aptitude which has better searching capability. Here is my search
result of graphic tools including OCR.

http://people.debian.org/~osamu/pub/getwiki/html/ch12.en.html#listofgraphicdatatools

> Thank you in anticipation.

Osamu


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 07-14-2008, 05:51 PM
Bret Busby
 
Default Query about OCR package(s)

On Mon, 14 Jul 2008, Bret Busby wrote:




hello.

Is an OCR package available for Debian 4.0, in .deb form, that can read from
PDF files, to allow text to be extracted from PDF files?


In looking at what is available in Synaptic, I could not find such a package.

Thank you in anticipation.

--
Bret Busby
Armadale
West Australia
..............




Since sending the above message, due to something that happened, where I
had to cite material that was in a PDF document that was published on
the Internet, I clicked on the link for the PDF document, in Iceape
(which I use for accessing web pages when I want to write an email using
material in the web pages, as Iceape includes the email facility, like
the Netscape and Mozilla suites), and the document viewer (Adobe Reader
8.0) opened the document within the tab, and I was able to simply mark
and copy and paste the text, as if it was simply text in an HTML web
page, or in a word processor document.


So, Adobe Reader 8.0 provides the text extraction, or copying, that I
sought, so an OCR application that imports text from PDF files, is now
probably redundant (other than that it could function as a smaller,
standalone, application, but this seems to be adequate).


Now, if only I could print from Adobe Reader 8.0 (I can print PDF files
from Evince 0.4.0, but not from Adobe 8.0)...


--
Bret Busby
Armadale
West Australia
..............

"So once you do know what the question actually is,
you'll know what the answer means."
- Deep Thought,
Chapter 28 of Book 1 of
"The Hitchhiker's Guide to the Galaxy:
A Trilogy In Four Parts",
written by Douglas Adams,
published by Pan Books, 1992

.................................................. ..


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 07-14-2008, 05:59 PM
Brad Rogers
 
Default Query about OCR package(s)

On Tue, 15 Jul 2008 01:51:23 +0800 (WST)
Bret Busby <bret@busby.net> wrote:

Hello Bret,

> So, Adobe Reader 8.0 provides the text extraction, or copying, that I
> sought, so an OCR application that imports text from PDF files, is
> now probably redundant (other than that it could function as a

Only for PDFs that have got text in them. Some PDFs have scanned
pages, which we can read, but there's no text to extract; It's a
graphic file when all's said and done. You'll need OCR for the images.

--
Regards _
/ ) "The blindingly obvious is
/ _)rad never immediately apparent"

Is she really going out with him?
New Rose - The Damned
 
Old 07-17-2008, 03:03 PM
Michelle Konzack
 
Default Query about OCR package(s)

Am 2008-07-15 00:08:45, schrieb Osamu Aoki:
> Yes ... From PDF? I do not know but implimenting it is simple with some
> filtering.

The OP can use "netpbm" to pipe the PDF through the module and then into
the ORC software...

Have a nice Day/Evening
Michelle Konzack


--
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack Apt. 917 ICQ #328449886
+49/177/9351947 50, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)
 

Thread Tools




All times are GMT. The time now is 03:48 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org