wget
hiii all....
we can download any site from "wget -r " options. if i want to stop downloading of my site from web server how i can do this??? -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
Put a file named 'robots.txt' in the root of your Web directory.
Here's a link. http://www.searchtools.com/robots/robots-txt.html Joy Methew wrote: hiii all.... we can download any site from "wget -r " options. if i want to stop downloading of my site from web server how i can do this??? -- veritatas simplex oratio est -Seneca Andrew Bacchi Systems Programmer Rensselaer Polytechnic Institute phone: 518.276.6415 fax: 518.276.2809 http://www.rpi.edu/~bacchi/ -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
2008/6/27 Joy Methew <ml4joy@gmail.com>:
> hiii all.... > > we can download any site from "wget -r " options. > if i want to stop downloading of my site from web server how i can do > this??? You can configure Apache for refuse connections with UserAgent "wget", but note that wget can use any UserAgent (--user-agent option). SetEnvIfNoCase User-Agent "^wget" blacklist <Location /> ... your options ... Order allow,deny Allow from all Deny from env=blacklist </Location> BTW: robots.txt only can stop crawling from "good" crawlers, like google, yahoo, alexa, etc. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
Bacchi
how i use "robost.txt" plz explain with example. Daniel...... it`s working for "wget" but still we can download from other utilities like..."DownloadStudio" On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote: > > 2008/6/27 Joy Methew <ml4joy@gmail.com>: > > > hiii all.... > > > > we can download any site from "wget -r " options. > > if i want to stop downloading of my site from web server how i can do > > this??? > > > You can configure Apache for refuse connections with UserAgent "wget", > but note that wget can use any UserAgent (--user-agent option). > > SetEnvIfNoCase User-Agent "^wget" blacklist > <Location /> > ... > your options > ... > Order allow,deny > Allow from all > Deny from env=blacklist > </Location> > > BTW: robots.txt only can stop crawling from "good" crawlers, like > google, yahoo, alexa, etc. > > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
Bacchi
how i use "robost.txt" plz explain with example. Daniel...... it`s working for "wget" but still we can download from other utilities like..."DownloadStudio On Sun, Jun 29, 2008 at 11:45 PM, Joy Methew <ml4joy@gmail.com> wrote: > Bacchi > > how i use "robost.txt" plz explain with example. > > Daniel...... > > it`s working for "wget" but still we can download from other utilities > like..."DownloadStudio" > > On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote: >> >> 2008/6/27 Joy Methew <ml4joy@gmail.com>: >> >> > hiii all.... >> > >> > we can download any site from "wget -r " options. >> > if i want to stop downloading of my site from web server how i can do >> > this??? >> >> >> You can configure Apache for refuse connections with UserAgent "wget", >> but note that wget can use any UserAgent (--user-agent option). >> >> SetEnvIfNoCase User-Agent "^wget" blacklist >> <Location /> >> ... >> your options >> ... >> Order allow,deny >> Allow from all >> Deny from env=blacklist >> </Location> >> >> BTW: robots.txt only can stop crawling from "good" crawlers, like >> google, yahoo, alexa, etc. >> >> >> -- >> redhat-list mailing list >> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe >> https://www.redhat.com/mailman/listinfo/redhat-list >> > > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
I've already sent you a link that provides explanation and examples. I
don't mind pointing someone in the right direction, but I won't sit here and solve all your problems for you. Try searching google. Joy Methew wrote: Bacchi how i use "robost.txt" plz explain with example. Daniel...... it`s working for "wget" but still we can download from other utilities like..."DownloadStudio" On 6/27/08, Daniel Carrillo <daniel.carrillo@gmail.com> wrote: 2008/6/27 Joy Methew <ml4joy@gmail.com>: hiii all.... we can download any site from "wget -r " options. if i want to stop downloading of my site from web server how i can do this??? You can configure Apache for refuse connections with UserAgent "wget", but note that wget can use any UserAgent (--user-agent option). SetEnvIfNoCase User-Agent "^wget" blacklist <Location /> ... your options ... Order allow,deny Allow from all Deny from env=blacklist </Location> BTW: robots.txt only can stop crawling from "good" crawlers, like google, yahoo, alexa, etc. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list -- veritatas simplex oratio est -Seneca Andrew Bacchi Systems Programmer Rensselaer Polytechnic Institute phone: 518.276.6415 fax: 518.276.2809 http://www.rpi.edu/~bacchi/ -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
2008/6/30 Joy Methew <ml4joy@gmail.com>:
> Bacchi > > how i use "robost.txt" plz explain with example. > > Daniel...... > it`s working for "wget" but still we can download from other utilities > like..."DownloadStudio > Add another line: SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist. Check Apache documentation for more details. BR. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist.
i have done it but it`s not working... On Mon, Jun 30, 2008 at 1:12 PM, Daniel Carrillo <daniel.carrillo@gmail.com> wrote: > 2008/6/30 Joy Methew <ml4joy@gmail.com>: > > Bacchi > > > > how i use "robost.txt" plz explain with example. > > > > Daniel...... > > it`s working for "wget" but still we can download from other utilities > > like..."DownloadStudio > > > > Add another line: > > SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist. > > Check Apache documentation for more details. > > BR. > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
2008/7/1 Joy Methew <ml4joy@gmail.com>:
> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist. > > i have done it but it`s not working... > > On Mon, Jun 30, 2008 at 1:12 PM, Daniel Carrillo <daniel.carrillo@gmail.com> > wrote: > >> 2008/6/30 Joy Methew <ml4joy@gmail.com>: >> > Bacchi >> > >> > how i use "robost.txt" plz explain with example. >> > >> > Daniel...... >> > it`s working for "wget" but still we can download from other utilities >> > like..."DownloadStudio >> > >> >> Add another line: >> >> SetEnvIfNoCase User-Agent "^DownloadStudio" blacklist. It is only a approach, please check de User Agent of DownloadStudio and also check the regexp. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
wget
Hi,
I've been trying some wget options to download "http://archive.kernel.org/fedora-archive/core/updates/5/i386/" (only the contents of i386) to my current directory but it only downloads index.html and robots.txt # pwd /5/i386 # wget -r -nc http://archive.kernel.org/fedora-archive/core/updates/5/i386/ it creates the directory hierarchy and downloads index and robots only. I just want to get all the files that are inside the i386. what options should i use? thanks, Sir June -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
| All times are GMT. The time now is 02:10 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.