Search engine for documentation indexing?
I'd like to build an index of the documentation in /usr/share/doc,
but I am quite unhappy with the options I have tried so far: 1. dwww has a built in cgi for searching an index built by swish++. Unfortunately swish++ indexing seems to take forever (it's described as "lighting fast" on the upstream website, but I can't find the pictures of flying pigs). Also, using the built in dwww integration has the disadvantage that only documents registered in doc-base are indexed, which misses a lot of them. On top of this swish++ shares the main problem of 2. swish-e. This looked very promising for a while, and I even wrote a python module to wrap the API: http://pypi.python.org/pypi?%3Aaction=search&term=pyswish&submit=search ... but it can't handle documents encoded other than ASCII and Latin-1 (in particular, it breaks on UTF-8 XHTML documents). This is a show-stopper. 3. xapian-omega. This seems to be the one modern apps are migrating to, I heard of the Gnus mail/newsreader acquiring a xapian based search function. But, out of the box it cannot index gzipped files (and most documents in /usr/share/doc other that HTML pages are gzipped), and there doesn't seem to be a way to add a user-defined filter either to compensate for this (swish-e has user filters). I can't be the only one looking for this, so what do other debianists do? -- Ian Zimmerman <itz@buug.org> gpg public key: 1024D/C6FF61AD fingerprint: 66DC D68F 5C1B 4D71 2EE5 BD03 8A00 786C C6FF 61AD Ham is for reading, not for eating. -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
Search engine for documentation indexing?
Ian Zimmerman wrote:
I'd like to build an index of the documentation in /usr/share/doc, but I am quite unhappy with the options I have tried so far: 1. dwww has a built in cgi for searching an index built by swish++. Unfortunately swish++ indexing seems to take forever (it's described as "lighting fast" on the upstream website, but I can't find the pictures of flying pigs). Also, using the built in dwww integration has the disadvantage that only documents registered in doc-base are indexed, which misses a lot of them. On top of this swish++ shares the main problem of 2. swish-e. This looked very promising for a while, and I even wrote a python module to wrap the API: http://pypi.python.org/pypi?%3Aaction=search&term=pyswish&submit=search ... but it can't handle documents encoded other than ASCII and Latin-1 (in particular, it breaks on UTF-8 XHTML documents). This is a show-stopper. 3. xapian-omega. This seems to be the one modern apps are migrating to, I heard of the Gnus mail/newsreader acquiring a xapian based search function. But, out of the box it cannot index gzipped files (and most documents in /usr/share/doc other that HTML pages are gzipped), and there doesn't seem to be a way to add a user-defined filter either to compensate for this (swish-e has user filters). I can't be the only one looking for this, so what do other debianists do? I use recoll and dwww but rely on recoll more and more. Wayne -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
Search engine for documentation indexing?
When the date was Tuesday 09 of February 2010, Wayne <linuxtwo@gmail.com>
wrote: > Ian Zimmerman wrote: > > I'd like to build an index of the documentation in /usr/share/doc, [...] > > I can't be the only one looking for this, so what do other debianists > > do? > > I use recoll and dwww but rely on recoll more and more. One more vote for recoll. Supports custom filters and you can index most common documentation formats. -- Michael Iatrou -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
| All times are GMT. The time now is 12:15 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.