Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Red Hat Linux (http://www.linux-archive.org/red-hat-linux/)
-   -   wget (http://www.linux-archive.org/red-hat-linux/114749-wget.html)

"Joy Methew" 06-27-2008 04:43 PM

wget
 
hiii all....

we can download any site from "wget -r " options.
if i want to stop downloading of my site from web server how i can do
this???
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

Andrew Bacchi 06-27-2008 04:47 PM

wget
 
Put a file named 'robots.txt' in the root of your Web directory.

Here's a link.

http://www.searchtools.com/robots/robots-txt.html

Joy Methew wrote:

hiii all....

we can download any site from "wget -r " options.
if i want to stop downloading of my site from web server how i can do
this???



--
veritatas simplex oratio est
-Seneca

Andrew Bacchi
Systems Programmer
Rensselaer Polytechnic Institute
phone: 518.276.6415 fax: 518.276.2809

http://www.rpi.edu/~bacchi/

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

"Daniel Carrillo" 06-27-2008 06:31 PM

wget
 
2008/6/27 Joy Methew <ml4joy@gmail.com>:
> hiii all....
>
> we can download any site from "wget -r " options.
> if i want to stop downloading of my site from web server how i can do
> this???

You can configure Apache for refuse connections with UserAgent "wget",
but note that wget can use any UserAgent (--user-agent option).

SetEnvIfNoCase User-Agent "^wget" blacklist
<Location />
...
your options
...
Order allow,deny
Allow from all
Deny from env=blacklist
</Location>

BTW: robots.txt only can stop crawling from "good" crawlers, like
google, yahoo, alexa, etc.

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

"Joy Methew" 06-29-2008 06:15 PM

wget
 
Bacchi

how i use "robost.txt" plz explain with example.

Daniel......

it`s working for "wget" but still we can download from other utilities
like..."DownloadStudio"

On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote:
>
> 2008/6/27 Joy Methew <ml4joy@gmail.com>:
>
> > hiii all....
> >
> > we can download any site from "wget -r " options.
> > if i want to stop downloading of my site from web server how i can do
> > this???
>
>
> You can configure Apache for refuse connections with UserAgent "wget",
> but note that wget can use any UserAgent (--user-agent option).
>
> SetEnvIfNoCase User-Agent "^wget" blacklist
> <Location />
> ...
> your options
> ...
> Order allow,deny
> Allow from all
> Deny from env=blacklist
> </Location>
>
> BTW: robots.txt only can stop crawling from "good" crawlers, like
> google, yahoo, alexa, etc.
>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

"Joy Methew" 06-30-2008 08:16 AM

wget
 
Bacchi

how i use "robost.txt" plz explain with example.

Daniel......
it`s working for "wget" but still we can download from other utilities
like..."DownloadStudio

On Sun, Jun 29, 2008 at 11:45 PM, Joy Methew <ml4joy@gmail.com> wrote:

> Bacchi
>
> how i use "robost.txt" plz explain with example.
>
> Daniel......
>
> it`s working for "wget" but still we can download from other utilities
> like..."DownloadStudio"
>
> On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote:
>>
>> 2008/6/27 Joy Methew <ml4joy@gmail.com>:
>>
>> > hiii all....
>> >
>> > we can download any site from "wget -r " options.
>> > if i want to stop downloading of my site from web server how i can do
>> > this???
>>
>>
>> You can configure Apache for refuse connections with UserAgent "wget",
>> but note that wget can use any UserAgent (--user-agent option).
>>
>> SetEnvIfNoCase User-Agent "^wget" blacklist
>> <Location />
>> ...
>> your options
>> ...
>> Order allow,deny
>> Allow from all
>> Deny from env=blacklist
>> </Location>
>>
>> BTW: robots.txt only can stop crawling from "good" crawlers, like
>> google, yahoo, alexa, etc.
>>
>>
>> --
>> redhat-list mailing list
>> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
>> https://www.redhat.com/mailman/listinfo/redhat-list
>>
>
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

Andrew Bacchi 06-30-2008 12:45 PM

wget
 
I've already sent you a link that provides explanation and examples. I
don't mind pointing someone in the right direction, but I won't sit here
and solve all your problems for you. Try searching google.


Joy Methew wrote:

Bacchi

how i use "robost.txt" plz explain with example.

Daniel......

it`s working for "wget" but still we can download from other utilities
like..."DownloadStudio"

On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote:


2008/6/27 Joy Methew <ml4joy@gmail.com>:



hiii all....

we can download any site from "wget -r " options.
if i want to stop downloading of my site from web server how i can do
this???


You can configure Apache for refuse connections with UserAgent "wget",
but note that wget can use any UserAgent (--user-agent option).

SetEnvIfNoCase User-Agent "^wget" blacklist
<Location />
...
your options
...
Order allow,deny
Allow from all
Deny from env=blacklist
</Location>

BTW: robots.txt only can stop crawling from "good" crawlers, like
google, yahoo, alexa, etc.


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list




--
veritatas simplex oratio est
-Seneca

Andrew Bacchi
Systems Programmer
Rensselaer Polytechnic Institute
phone: 518.276.6415 fax: 518.276.2809

http://www.rpi.edu/~bacchi/

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

"Daniel Carrillo" 06-30-2008 05:12 PM

wget
 
2008/6/30 Joy Methew <ml4joy@gmail.com>:
> Bacchi
>
> how i use "robost.txt" plz explain with example.
>
> Daniel......
> it`s working for "wget" but still we can download from other utilities
> like..."DownloadStudio
>

Add another line:

SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.

Check Apache documentation for more details.

BR.

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

"Joy Methew" 07-01-2008 04:01 AM

wget
 
SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.

i have done it but it`s not working...

On Mon, Jun 30, 2008 at 1:12 PM, Daniel Carrillo <daniel.carrillo@gmail.com>
wrote:

> 2008/6/30 Joy Methew <ml4joy@gmail.com>:
> > Bacchi
> >
> > how i use "robost.txt" plz explain with example.
> >
> > Daniel......
> > it`s working for "wget" but still we can download from other utilities
> > like..."DownloadStudio
> >
>
> Add another line:
>
> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.
>
> Check Apache documentation for more details.
>
> BR.
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

"Daniel Carrillo" 07-01-2008 04:05 PM

wget
 
2008/7/1 Joy Methew <ml4joy@gmail.com>:
> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.
>
> i have done it but it`s not working...
>
> On Mon, Jun 30, 2008 at 1:12 PM, Daniel Carrillo <daniel.carrillo@gmail.com>
> wrote:
>
>> 2008/6/30 Joy Methew <ml4joy@gmail.com>:
>> > Bacchi
>> >
>> > how i use "robost.txt" plz explain with example.
>> >
>> > Daniel......
>> > it`s working for "wget" but still we can download from other utilities
>> > like..."DownloadStudio
>> >
>>
>> Add another line:
>>
>> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.

It is only a approach, please check de User Agent of DownloadStudio
and also check the regexp.

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

Sir June 07-26-2010 03:37 PM

wget
 
Hi,

I've been trying some wget options to download
"http://archive.kernel.org/fedora-archive/core/updates/5/i386/" (only the
contents of i386)
to my current directory but it only downloads index.html and robots.txt


# pwd
/5/i386
# wget -r -nc http://archive.kernel.org/fedora-archive/core/updates/5/i386/

it creates the directory hierarchy and downloads index and robots only. I just
want to get all the files that are inside the i386.

what options should i use?


thanks,
Sir June




--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list


All times are GMT. The time now is 07:21 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.