FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Kubuntu User

 
 
LinkBack Thread Tools
 
Old 01-20-2008, 07:08 PM
"Lou Katz"
 
Default wget problem

On Sun, Jan 20, 2008 at 08:15:25AM +0000, Wulfy wrote:
> [I sent this to Donn's private e-mail by mistake.. sorry Donn.]
>
> Donn wrote:
> > There is a gui that does this. It has a name so abysmal that I can't recall
> > it...
> >
> > I used this scripts once a few years ago to fetch a website.
> > It gets two parameters: url level
> > The level is how far down a chain of links it should go.
> > You could just replace the vars and run the command directly.
> > ===
> >
> > #!/bin/bash
> > #Try to make using wget easier than it bloody is.
> > url=$1
> > if [ -z $url ]; then (echo "Bad url"; exit 1); fi
> > LEV=$2
> > if [ -z $LEV ]; then
> > LEV="2"
> > fi
> >
> > echo "running: wget --convert-links -r -l$LEV $url -o log"
> > wget --convert-links -r -l$LEV "$url" -o log
> >
> > ===
> >
> > man wget is the best plan really.
> >
> >
> > d
> >
> >
> <sigh> I don't know what I'm doing wrong, but I can't get wget to get
> more than the top layer of the site. The archive.org site just brings
> in index.html (and robots.txt). I tried it on another site and it
> brought in the two versions of the main page (dialup and high speed) but
> the menu links weren't followed. I tried -l5 and -15 and got the same
> download.
>
> Any idea why the -r isn't recursing?

Have you used the underdocumented option to ignore robots.txt?

put
robots = off

in your .wgetrc, or use

-erobots=off

on the command line.

>
> --
> Blessings
>
> Wulfmann
>
> Wulf Credo:
> Respect the elders. Teach the young. Co-operate with the pack.
> Play when you can. Hunt when you must. Rest in between.
> Share your affections. Voice your opinion. Leave your Mark.
> Copyright July 17, 1988 by Del Goetz
>
>
>
> --
> kubuntu-users mailing list
> kubuntu-users@lists.ubuntu.com
> Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/kubuntu-users

--

-=[L]=-
South Hampstead.

--
WebMail Services from Metron Computerware (http://www.metron.com)


--
kubuntu-users mailing list
kubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/kubuntu-users
 
Old 01-21-2008, 01:02 AM
Wulfy
 
Default wget problem

Lou Katz wrote:
> On Sun, Jan 20, 2008 at 08:15:25AM +0000, Wulfy wrote:
>
>> <sigh> I don't know what I'm doing wrong, but I can't get wget to get
>> more than the top layer of the site. The archive.org site just brings
>> in index.html (and robots.txt). I tried it on another site and it
>> brought in the two versions of the main page (dialup and high speed) but
>> the menu links weren't followed. I tried -l5 and -15 and got the same
>> download.
>>
>> Any idea why the -r isn't recursing?
>>
>
> Have you used the underdocumented option to ignore robots.txt?
>
> put
> robots = off
>
> in your .wgetrc, or use
>
> -erobots=off
>
> on the command line.

It turns out the problem was with robots.txt. I'll try your solution.
Thanks, Lou!

--
Blessings

Wulfmann

Wulf Credo:
Respect the elders. Teach the young. Co-operate with the pack.
Play when you can. Hunt when you must. Rest in between.
Share your affections. Voice your opinion. Leave your Mark.
Copyright July 17, 1988 by Del Goetz


--
kubuntu-users mailing list
kubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/kubuntu-users
 
Old 01-21-2008, 01:10 AM
Wulfy
 
Default wget problem

Wulfy wrote:
> Lou Katz wrote:
>
>> Have you used the underdocumented option to ignore robots.txt?
>> put
>> robots = off
>>
>> in your .wgetrc, or use
>>
>> -erobots=off
>>
>> on the command line.
>>
>
> It turns out the problem was with robots.txt. I'll try your solution.
> Thanks, Lou!
>
>
No go. It just doesn't download the robot.txt file. I get an error
"Couldn't download this page from the archive" in place of the net
file. :@(

It's looking more and more like the "easy" way to do this is manually...

--
Blessings

Wulfmann

Wulf Credo:
Respect the elders. Teach the young. Co-operate with the pack.
Play when you can. Hunt when you must. Rest in between.
Share your affections. Voice your opinion. Leave your Mark.
Copyright July 17, 1988 by Del Goetz


--
kubuntu-users mailing list
kubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/kubuntu-users
 
Old 08-20-2008, 11:22 PM
Paul
 
Default wget problem

Hi,

I'm trying to grab all of the .zip files from www.urbanfonts.com using
wget.

If I use flashgot, it will work for the page I'm on, but the output from
flashgot doesn't tell me which options are used?

All of the files are in the form of xyz.php?fontname.zip.

Any ideas on how to get them easier than doing every page with flashgot?

TTFN

Paul
--
Sie können mich aufreizen und wirklich heiß machen!
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 08-20-2008, 11:31 PM
Patrick Kaiser
 
Default wget problem

On Thu, Aug 21, 2008 at 12:22:37AM +0100, Paul wrote:
> Hi,
>
> I'm trying to grab all of the .zip files from www.urbanfonts.com using
> wget.
>
> If I use flashgot, it will work for the page I'm on, but the output from
> flashgot doesn't tell me which options are used?
>
> All of the files are in the form of xyz.php?fontname.zip.
>
> Any ideas on how to get them easier than doing every page with flashgot?
>
> TTFN
>
> Paul
> --
> ???Sie knnen mich aufreizen und wirklich hei machen!




> --
> fedora-list mailing list
> fedora-list@redhat.com
> To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Hi Paul,

do you have a complete list of the fontnames? Then you can try it in a
loop?

Greets


--

Patrick Kaiser

URL: http://argonius.de
EMail: patrick.kaiser@argonius.de
RIPE: PK3264-RIPE

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 08-20-2008, 11:42 PM
Paul
 
Default wget problem

Hi,

> do you have a complete list of the fontnames? Then you can try it in a
> loop?

'fraid not - the names are stored somewhere on the site and added as a
$POST (by the look of it) when you click on the link...

TTFN

Paul
--
Sie können mich aufreizen und wirklich heiß machen!
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 08-21-2008, 01:22 AM
Patrick Kaiser
 
Default wget problem

On Thu, Aug 21, 2008 at 12:42:33AM +0100, Paul wrote:
> Hi,
>
> > do you have a complete list of the fontnames? Then you can try it in a
> > loop?
>
> 'fraid not - the names are stored somewhere on the site and added as a
> $POST (by the look of it) when you click on the link...
>
> TTFN
>
> Paul
> --
> ???Sie knnen mich aufreizen und wirklich hei machen!



> --
> fedora-list mailing list
> fedora-list@redhat.com
> To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

--


Hey Paul,

Just try this. I do not have another solution:

mkdir /tmp/bla
cd /tmp/bla
wget -r http://www.urbanfonts.com
mkdir /tmp/blubb
cd /tmp/blubb
perl -e '@foo=`grep -r -i zip /tmp/bla/*`; foreach (@foo){ if (~/(index.php.*?)"/) {print "wget http://www.urbanfonts.com/scripts/" . $1 . "
";} };'| sh

Than you will have all fonts in /tmp/blubb and named as
index.php?<fontname.zip>

I hope this solutions helps a bit.

Maybe you can also try the --spider option of wget. Didn't tryied this.


Greets, Patrick


Patrick Kaiser

URL: http://argonius.de
EMail: patrick.kaiser@argonius.de
RIPE: PK3264-RIPE

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 08-21-2008, 02:05 AM
"S B"
 
Default wget problem

2008/8/20 Paul <paul@all-the-johnsons.co.uk>:
> Hi,
>
> I'm trying to grab all of the .zip files from www.urbanfonts.com using
> wget.
>
> If I use flashgot, it will work for the page I'm on, but the output from
> flashgot doesn't tell me which options are used?
>
> All of the files are in the form of xyz.php?fontname.zip.
>
> Any ideas on how to get them easier than doing every page with flashgot?
>
> TTFN
>
> Paul

try imacro

https://addons.mozilla.org/en-US/firefox/addon/3863

look at the examples and create/modify(existing) script

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 

Thread Tools




All times are GMT. The time now is 12:02 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org