Is an OCR package available for Debian 4.0, in .deb form, that can read
from PDF files, to allow text to be extracted from PDF files?
In looking at what is available in Synaptic, I could not find such a
package.
Thank you in anticipation.
--
Bret Busby
Armadale
West Australia
..............
"So once you do know what the question actually is,
you'll know what the answer means."
- Deep Thought,
Chapter 28 of Book 1 of
"The Hitchhiker's Guide to the Galaxy:
A Trilogy In Four Parts",
written by Douglas Adams,
published by Pan Books, 1992
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
07-14-2008, 03:08 PM
Osamu Aoki
Query about OCR package(s)
Hi,
On Mon, Jul 14, 2008 at 02:00:06PM +0800, Bret Busby wrote:
>
> hello.
>
> Is an OCR package available for Debian 4.0, in .deb form, that can read
> from PDF files, to allow text to be extracted from PDF files?
Yes ... From PDF? I do not know but implimenting it is simple with some
filtering.
Questio is more on quality or correctness of OCR result. So far, I have
mixed result of very bad and fairly usable. None was very good.
I think its database needs to be trained.
> In looking at what is available in Synaptic, I could not find such a
> package.
Use aptitude which has better searching capability. Here is my search
result of graphic tools including OCR.
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
07-14-2008, 05:51 PM
Bret Busby
Query about OCR package(s)
On Mon, 14 Jul 2008, Bret Busby wrote:
hello.
Is an OCR package available for Debian 4.0, in .deb form, that can read from
PDF files, to allow text to be extracted from PDF files?
In looking at what is available in Synaptic, I could not find such a package.
Thank you in anticipation.
--
Bret Busby
Armadale
West Australia
..............
Since sending the above message, due to something that happened, where I
had to cite material that was in a PDF document that was published on
the Internet, I clicked on the link for the PDF document, in Iceape
(which I use for accessing web pages when I want to write an email using
material in the web pages, as Iceape includes the email facility, like
the Netscape and Mozilla suites), and the document viewer (Adobe Reader
8.0) opened the document within the tab, and I was able to simply mark
and copy and paste the text, as if it was simply text in an HTML web
page, or in a word processor document.
So, Adobe Reader 8.0 provides the text extraction, or copying, that I
sought, so an OCR application that imports text from PDF files, is now
probably redundant (other than that it could function as a smaller,
standalone, application, but this seems to be adequate).
Now, if only I could print from Adobe Reader 8.0 (I can print PDF files
from Evince 0.4.0, but not from Adobe 8.0)...
--
Bret Busby
Armadale
West Australia
..............
"So once you do know what the question actually is,
you'll know what the answer means."
- Deep Thought,
Chapter 28 of Book 1 of
"The Hitchhiker's Guide to the Galaxy:
A Trilogy In Four Parts",
written by Douglas Adams,
published by Pan Books, 1992
> So, Adobe Reader 8.0 provides the text extraction, or copying, that I
> sought, so an OCR application that imports text from PDF files, is
> now probably redundant (other than that it could function as a
Only for PDFs that have got text in them. Some PDFs have scanned
pages, which we can read, but there's no text to extract; It's a
graphic file when all's said and done. You'll need OCR for the images.
--
Regards _
/ ) "The blindingly obvious is
/ _)rad never immediately apparent"
Is she really going out with him?
New Rose - The Damned
07-17-2008, 03:03 PM
Michelle Konzack
Query about OCR package(s)
Am 2008-07-15 00:08:45, schrieb Osamu Aoki:
> Yes ... From PDF? I do not know but implimenting it is simple with some
> filtering.
The OP can use "netpbm" to pipe the PDF through the module and then into
the ORC software...
Have a nice Day/Evening
Michelle Konzack
--
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack Apt. 917 ICQ #328449886
+49/177/9351947 50, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)