On Mon, Dec 5, 2011 at 3:48 AM, Andreas K. Huettel <email@example.com> wrote:
> Seriously, what do we gain from crawlers accessing sources.gentoo.org? *I cant
> really remember seeing it once in a google query result...
We want the site searchable.
> Possibly it would not even be required to deny all requests, but just deny
> everything related to ancient history...
>> For a while sources.gentoo.org has been puttering along and its health
>> has slowly declined. We migrated it to some newer shiny hardware in an
>> attempt to mitigate the problem but that did not pan out. 90% (or
>> more) of sources.gentoo.org traffic is crawler bots and not actual
>> humans. That being said; if we cannot serve requests to the bots
>> within our timeouts we serve 500's instead which is never really what
>> we want (particularly when we spent 20s of CPU to calculate 80% of the
>> response only to see the client timeout :/.)
>> The majority of the expensive requests are related to package.mask and
>> use.local.desc queries by crawlers. Like crawling the entire 13000 rev
>> history for package.mask (or similar.)
>> While it is likely we will monkey patch viewvc to be less wasteful; in
>> the meantime I have removed use.local.desc from sources.gentoo.org
>> (and also anoncvs, because they share the same repo.) I hope this is a
>> short term (order of weeks) hack.
> Andreas K. Huettel
> Gentoo Linux developer
> kde, sci, arm, tex, printing