FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 02-09-2010, 05:43 AM
Ian Zimmerman
 
Default Search engine for documentation indexing?

I'd like to build an index of the documentation in /usr/share/doc,
but I am quite unhappy with the options I have tried so far:

1. dwww has a built in cgi for searching an index built by swish++.
Unfortunately swish++ indexing seems to take forever (it's described as
"lighting fast" on the upstream website, but I can't find the pictures
of flying pigs). Also, using the built in dwww integration has the
disadvantage that only documents registered in doc-base are indexed,
which misses a lot of them. On top of this swish++ shares the main
problem of

2. swish-e. This looked very promising for a while, and I even wrote
a python module to wrap the API:

http://pypi.python.org/pypi?%3Aaction=search&term=pyswish&submit=search

... but it can't handle documents encoded other than ASCII and Latin-1
(in particular, it breaks on UTF-8 XHTML documents). This is a
show-stopper.

3. xapian-omega. This seems to be the one modern apps are migrating to,
I heard of the Gnus mail/newsreader acquiring a xapian based search
function. But, out of the box it cannot index gzipped files (and most
documents in /usr/share/doc other that HTML pages are gzipped), and
there doesn't seem to be a way to add a user-defined filter either
to compensate for this (swish-e has user filters).

I can't be the only one looking for this, so what do other debianists do?

--
Ian Zimmerman <itz@buug.org>
gpg public key: 1024D/C6FF61AD
fingerprint: 66DC D68F 5C1B 4D71 2EE5 BD03 8A00 786C C6FF 61AD
Ham is for reading, not for eating.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 02-09-2010, 12:23 PM
"Wayne
 
Default Search engine for documentation indexing?

Ian Zimmerman wrote:

I'd like to build an index of the documentation in /usr/share/doc,
but I am quite unhappy with the options I have tried so far:

1. dwww has a built in cgi for searching an index built by swish++.
Unfortunately swish++ indexing seems to take forever (it's described as
"lighting fast" on the upstream website, but I can't find the pictures
of flying pigs). Also, using the built in dwww integration has the
disadvantage that only documents registered in doc-base are indexed,
which misses a lot of them. On top of this swish++ shares the main
problem of

2. swish-e. This looked very promising for a while, and I even wrote
a python module to wrap the API:


http://pypi.python.org/pypi?%3Aaction=search&term=pyswish&submit=search

... but it can't handle documents encoded other than ASCII and Latin-1
(in particular, it breaks on UTF-8 XHTML documents). This is a
show-stopper.

3. xapian-omega. This seems to be the one modern apps are migrating to,
I heard of the Gnus mail/newsreader acquiring a xapian based search
function. But, out of the box it cannot index gzipped files (and most
documents in /usr/share/doc other that HTML pages are gzipped), and
there doesn't seem to be a way to add a user-defined filter either
to compensate for this (swish-e has user filters).

I can't be the only one looking for this, so what do other debianists do?



I use recoll and dwww but rely on recoll more and more.

Wayne


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 02-09-2010, 07:05 PM
Michael Iatrou
 
Default Search engine for documentation indexing?

When the date was Tuesday 09 of February 2010, Wayne <linuxtwo@gmail.com>
wrote:

> Ian Zimmerman wrote:
> > I'd like to build an index of the documentation in /usr/share/doc, [...]
> > I can't be the only one looking for this, so what do other debianists
> > do?
>
> I use recoll and dwww but rely on recoll more and more.

One more vote for recoll. Supports custom filters and you can index most
common documentation formats.

--
Michael Iatrou


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 02:20 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org