we can download any site from "wget -r " options.
if i want to stop downloading of my site from web server how i can do
this???
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
06-27-2008, 04:47 PM
Andrew Bacchi
wget
Put a file named 'robots.txt' in the root of your Web directory.
Here's a link.
http://www.searchtools.com/robots/robots-txt.html
Joy Methew wrote:
hiii all....
we can download any site from "wget -r " options.
if i want to stop downloading of my site from web server how i can do
this???
--
veritatas simplex oratio est
-Seneca
Andrew Bacchi
Systems Programmer
Rensselaer Polytechnic Institute
phone: 518.276.6415 fax: 518.276.2809
http://www.rpi.edu/~bacchi/
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
06-27-2008, 06:31 PM
"Daniel Carrillo"
wget
2008/6/27 Joy Methew <ml4joy@gmail.com>:
> hiii all....
>
> we can download any site from "wget -r " options.
> if i want to stop downloading of my site from web server how i can do
> this???
You can configure Apache for refuse connections with UserAgent "wget",
but note that wget can use any UserAgent (--user-agent option).
SetEnvIfNoCase User-Agent "^wget" blacklist
<Location />
...
your options
...
Order allow,deny
Allow from all
Deny from env=blacklist
</Location>
BTW: robots.txt only can stop crawling from "good" crawlers, like
google, yahoo, alexa, etc.
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
06-29-2008, 06:15 PM
"Joy Methew"
wget
Bacchi
how i use "robost.txt" plz explain with example.
Daniel......
it`s working for "wget" but still we can download from other utilities
like..."DownloadStudio"
On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote:
>
> 2008/6/27 Joy Methew <ml4joy@gmail.com>:
>
> > hiii all....
> >
> > we can download any site from "wget -r " options.
> > if i want to stop downloading of my site from web server how i can do
> > this???
>
>
> You can configure Apache for refuse connections with UserAgent "wget",
> but note that wget can use any UserAgent (--user-agent option).
>
> SetEnvIfNoCase User-Agent "^wget" blacklist
> <Location />
> ...
> your options
> ...
> Order allow,deny
> Allow from all
> Deny from env=blacklist
> </Location>
>
> BTW: robots.txt only can stop crawling from "good" crawlers, like
> google, yahoo, alexa, etc.
>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
06-30-2008, 08:16 AM
"Joy Methew"
wget
Bacchi
how i use "robost.txt" plz explain with example.
Daniel......
it`s working for "wget" but still we can download from other utilities
like..."DownloadStudio
On Sun, Jun 29, 2008 at 11:45 PM, Joy Methew <ml4joy@gmail.com> wrote:
> Bacchi
>
> how i use "robost.txt" plz explain with example.
>
> Daniel......
>
> it`s working for "wget" but still we can download from other utilities
> like..."DownloadStudio"
>
> On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote:
>>
>> 2008/6/27 Joy Methew <ml4joy@gmail.com>:
>>
>> > hiii all....
>> >
>> > we can download any site from "wget -r " options.
>> > if i want to stop downloading of my site from web server how i can do
>> > this???
>>
>>
>> You can configure Apache for refuse connections with UserAgent "wget",
>> but note that wget can use any UserAgent (--user-agent option).
>>
>> SetEnvIfNoCase User-Agent "^wget" blacklist
>> <Location />
>> ...
>> your options
>> ...
>> Order allow,deny
>> Allow from all
>> Deny from env=blacklist
>> </Location>
>>
>> BTW: robots.txt only can stop crawling from "good" crawlers, like
>> google, yahoo, alexa, etc.
>>
>>
>> --
>> redhat-list mailing list
>> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
>> https://www.redhat.com/mailman/listinfo/redhat-list
>>
>
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
06-30-2008, 12:45 PM
Andrew Bacchi
wget
I've already sent you a link that provides explanation and examples. I
don't mind pointing someone in the right direction, but I won't sit here
and solve all your problems for you. Try searching google.
Joy Methew wrote:
Bacchi
how i use "robost.txt" plz explain with example.
Daniel......
it`s working for "wget" but still we can download from other utilities
like..."DownloadStudio"
On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote:
2008/6/27 Joy Methew <ml4joy@gmail.com>:
hiii all....
we can download any site from "wget -r " options.
if i want to stop downloading of my site from web server how i can do
this???
You can configure Apache for refuse connections with UserAgent "wget",
but note that wget can use any UserAgent (--user-agent option).
SetEnvIfNoCase User-Agent "^wget" blacklist
<Location />
...
your options
...
Order allow,deny
Allow from all
Deny from env=blacklist
</Location>
BTW: robots.txt only can stop crawling from "good" crawlers, like
google, yahoo, alexa, etc.
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
--
veritatas simplex oratio est
-Seneca
Andrew Bacchi
Systems Programmer
Rensselaer Polytechnic Institute
phone: 518.276.6415 fax: 518.276.2809
http://www.rpi.edu/~bacchi/
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
06-30-2008, 05:12 PM
"Daniel Carrillo"
wget
2008/6/30 Joy Methew <ml4joy@gmail.com>:
> Bacchi
>
> how i use "robost.txt" plz explain with example.
>
> Daniel......
> it`s working for "wget" but still we can download from other utilities
> like..."DownloadStudio
>
On Mon, Jun 30, 2008 at 1:12 PM, Daniel Carrillo <daniel.carrillo@gmail.com>
wrote:
> 2008/6/30 Joy Methew <ml4joy@gmail.com>:
> > Bacchi
> >
> > how i use "robost.txt" plz explain with example.
> >
> > Daniel......
> > it`s working for "wget" but still we can download from other utilities
> > like..."DownloadStudio
> >
>
> Add another line:
>
> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.
>
> Check Apache documentation for more details.
>
> BR.
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
07-01-2008, 04:05 PM
"Daniel Carrillo"
wget
2008/7/1 Joy Methew <ml4joy@gmail.com>:
> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.
>
> i have done it but it`s not working...
>
> On Mon, Jun 30, 2008 at 1:12 PM, Daniel Carrillo <daniel.carrillo@gmail.com>
> wrote:
>
>> 2008/6/30 Joy Methew <ml4joy@gmail.com>:
>> > Bacchi
>> >
>> > how i use "robost.txt" plz explain with example.
>> >
>> > Daniel......
>> > it`s working for "wget" but still we can download from other utilities
>> > like..."DownloadStudio
>> >
>>
>> Add another line:
>>
>> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.
It is only a approach, please check de User Agent of DownloadStudio
and also check the regexp.
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
07-26-2010, 03:37 PM
Sir June
wget
Hi,
I've been trying some wget options to download
"http://archive.kernel.org/fedora-archive/core/updates/5/i386/" (only the
contents of i386)
to my current directory but it only downloads index.html and robots.txt