FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Development

 
 
LinkBack Thread Tools
 
Old 07-24-2008, 06:55 PM
"Franklin PIAT"
 
Default sysinf0 - website indexation

Hello,

I am working on a website[1], which purpose is let the visitor browse
a _virtual_ filesystem, made of all the files shipped in Debian
packages. Then view or compare the files.

The problem is that google will never finish indexing the 10 million
pages (not on my home DSL, at least)...

My first plan is to track unstable, then provide a kind of news feed for
search engines. [my DebCamp8 plan]

The second improvent, is to actualy prevent google from indexing useless
pages. The question is what pages are usefull, and which are useless ?
My current (quick) list is :
^/etc/.*$
^/var/lib/dpkg/.*$
^/usr/share/doc/[^/]*/[^/]*$

Any suggestion ?

Franklin

[1] http://sysinf0.klabs.be/


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 12:55 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org