FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 09-22-2012, 04:46 PM
 
Default Using wget to fill in a form

> Using ZOOM, mentioned in my previous post, you can use your perl script
> as a Z39.50 client to search the LOC catalog directly. There are also
> C, C++ and PHP bindings.

Ah, that makes sense. I will probably get after this again later today or tomorrow, and I will definitely post any success stories. It will probably take me a while to get back up to speed with perl since I haven't touched it in a couple of years.

Craig


Sent - Gtek Web Mail



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1348332366.452110897@webmail.gtek.biz">http://lists.debian.org/1348332366.452110897@webmail.gtek.biz
 
Old 09-22-2012, 05:06 PM
Lars Noodén
 
Default Using wget to fill in a form

On 9/22/12 7:46 PM, craig@gtek.biz wrote:
>> Using ZOOM, mentioned in my previous post, you can use your perl
>> script as a Z39.50 client to search the LOC catalog directly.
>> There are also C, C++ and PHP bindings.
>
> Ah, that makes sense. I will probably get after this again later
> today or tomorrow, and I will definitely post any success stories. It
> will probably take me a while to get back up to speed with perl since
> I haven't touched it in a couple of years.

It's there in the repository as libnet-z3950-zoom-perl

Regards,
/Lars


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 505DF027.80302@gmail.com">http://lists.debian.org/505DF027.80302@gmail.com
 
Old 09-22-2012, 05:36 PM
Camaleón
 
Default Using wget to fill in a form

On Sat, 22 Sep 2012 11:28:50 -0500, craig wrote:

>> As others suggest, the query should be something like:
>>
>> wget http://www.loc.gov/cgi-bin/zgate
>> --post-data="ACTION=SEARCH&TERM_1=1886411484&SESSION_ID=1 234567"
>
> Yeah, I was messing with the --post-data, but I didn't know I had to use
> an ACTION key. Will play with that.

(...)

Mmm... there's another door you can knock:

wget http://www.loc.gov/search --post-data="q=1886411484&all=true&st=list"

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/k3ksum$mv$20@ger.gmane.org
 
Old 09-23-2012, 02:30 AM
Gary Dale
 
Default Using wget to fill in a form

On 22/09/12 11:34 AM, Lars Noodén wrote:

On 9/22/12 6:01 PM, craig@gtek.biz wrote:

Greetings,

I have a small book collection (~150) that I thought would be neat to
catalog by the Library of Congress catalog numbers. I have found a
LOC search form that will allow me to input the ISBN, and it will
return the information I want:

Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090
I have the list of book ISBNs in a text file, so scripting this
should be quite easy. The problem is I can't figure out how to submit
the form from the command line. I figured wget would be the best way,
but everything I try results in downloading a single line that reads
"Your form didn't include an ACTION!" So I thought I would turn to
here for help. The test ISBN I am using is for The Linux Cookbook:
1886411484, QA76.76.O63S788 2001.

[snip]

If you want to screen scrape, the URI would be like this:

http://www.loc.gov/cgi-bin/zgate?ACTION=SEARCH&DBNAME=VOYAGER&ESNAME=B&MAXREC ORDS=20&RECSYNTAX=1.2.840.10003.5.10&REINIT=/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090&srchtype=1,1016,2, 102,3,3,4,2,5,100,6,1&SESSION_ID=4493330&TERM_1=18 86411484

However, the session ID expires after only a few minutes so you will
need a fresh one.

Regards,
/Lars
The solution is to wget the form to get a session id then submit the
query using that session id. If running multiple queries then keep
submitting them using the session id until one is rejected. With any
luck, you should be able to run multiple queries and also be able to
detect when a query is rejected due to an expired session.


You also could simply keep the get form / submit query pairing since I
doubt that the (possibly) unnecessary extra form gets are going to cause
a huge slowdown. I just think it's better to try to minimize traffic
where possible.



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Archive: 505E745D.3050106@rogers.com">http://lists.debian.org/505E745D.3050106@rogers.com
 
Old 09-23-2012, 03:45 AM
Jude DaShiell
 
Default Using wget to fill in a form

wget isn't the right tool for that job. However its brother wput may be
able to do the job. On Sat, 22 Sep 2012, Gary Dale wrote:

> On 22/09/12 11:27 AM, Gary Dale wrote:
> > On 22/09/12 11:01 AM, craig@gtek.biz wrote:
> > > Greetings,
> > >
> > > I have a small book collection (~150) that I thought would be neat to
> > > catalog by the Library of Congress catalog numbers. I have found a LOC
> > > search form that will allow me to input the ISBN, and it will return the
> > > information I want:
> > >
> > >
Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090
> > >
> > >
> > > I have the list of book ISBNs in a text file, so scripting this should be
> > > quite easy. The problem is I can't figure out how to submit the form from
> > > the command line. I figured wget would be the best way, but everything I
> > > try results in downloading a single line that reads "Your form didn't
> > > include an ACTION!" So I thought I would turn to here for help. The test
> > > ISBN I am using is for The Linux Cookbook: 1886411484, QA76.76.O63S788
> > > 2001.
> > >
> > > And a related side question. From my reading, I've learned that the Z39.50
> > > protocol is used to query databases, usually library related. Is anyone
> > > aware of an ISBN database table that can be downloaded by the user,
> > > preferably in a format that can be imported into MySQL or PostgreSQL?
> > >
> > > Thanks, Craig
> > >
> > The url you give is for the form. If you enter an ISBN number it will do the
> > search.
> >
> > What you need to do is capture the http header sent when you click "submit
> > query" then replace the test ISBN number with whatever number you want to
> > search. Wireshark can do this. Simply look for the query packet(s).
> >
> The fields you need are shown in the page source:
>
> <FORM METHOD="POST"ACTION="/cgi-bin/zgate">
> <INPUT NAME="ACTION"VALUE="SEARCH"TYPE="HIDDEN">
> <INPUT NAME="DBNAME"VALUE="VOYAGER"TYPE="HIDDEN">
> <INPUT NAME="ESNAME"VALUE="B"TYPE="HIDDEN">
> <INPUT NAME="MAXRECORDS"VALUE="20"TYPE="HIDDEN">
> <INPUT NAME="RECSYNTAX"VALUE="1.2.840.10003.5.10"TYPE="HI DDEN">
> <INPUT
> NAME="REINIT"TYPE="HIDDEN"VALUE="/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090">
> <INPUT NAME="srchtype"VALUE="1,1016,2,102,3,3,4,2,5,100,6 ,1"TYPE="HIDDEN">
>
> <P>
> <STRONG>Enter Search Term(s):</STRONG><br>(The search term can be a single
> word or a phrase from anywhere in the record. Enter an author's name in
> indirect order, i.e., last_name, first_name.)<p>
> <INPUT NAME="TERM_1"SIZE="60">
> <p>
> <INPUT TYPE="SUBMIT"VALUE="Submit Query">
> <INPUT Type="RESET"VALUE="Clear Form">
> <HR>
> Use of this form results in a search of the LC Voyager database (approximately
> 14 million records). This database contains records in all bibliographic
> formats (i.e., books, serials, music, maps, manuscripts, computer files, and
> visual materials), and includes the retrospective, unedited older
> bibliographic
> records known as the PreMARC File. LC name and subject authority records
> cannot be searched.
> <INPUT NAME="SESSION_ID"VALUE="5923056"TYPE="HIDDEN">
> </FORM>
>
>
> You need to construct the query using those fields with those values, with
> TERM_1 containing the ISBN number.
>
> From the error you are getting, it seems like your query either didn't include
> the SEARCH action or the header wasn't understood.
>
>
>
>
>

---------------------------------------------------------------------------
jude <jdashiel@shellworld.net>
Adobe fiend for failing to Flash



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: alpine.BSF.2.01.1209222344190.74944@freire1.furyyj beyq.arg">http://lists.debian.org/alpine.BSF.2.01.1209222344190.74944@freire1.furyyj beyq.arg
 
Old 09-23-2012, 09:31 AM
Pertti Kosunen
 
Default Using wget to fill in a form

On 22.9.2012 18:01, craig@gtek.biz wrote:

I have the list of book ISBNs in a text file, so scripting this
should be quite easy. The problem is I can't figure out how to submit
the form from the command line.


http://curl.haxx.se/docs/manpage.html

It should be quite easy with curl.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Archive: 505ED6DF.3040103@pp.nic.fi">http://lists.debian.org/505ED6DF.3040103@pp.nic.fi
 
Old 09-25-2012, 02:05 PM
Chris Bannister
 
Default Using wget to fill in a form

On Sat, Sep 22, 2012 at 10:01:51AM -0500, craig@gtek.biz wrote:
> Greetings,
>
> I have a small book collection (~150) that I thought would be neat to catalog by the Library of Congress catalog numbers. I have found a LOC search form that will allow me to input the ISBN, and it will return the information I want:
>
>
Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090
>
> I have the list of book ISBNs in a text file, so scripting this should be quite easy. The problem is I can't figure out how to submit the form from the command line. I figured wget would be the best way, but everything I try results in downloading a single line that reads "Your form didn't include an ACTION!" So I thought I would turn to here for help. The test ISBN I am using is for The Linux Cookbook: 1886411484, QA76.76.O63S788 2001.

Have a look at:
http://search.cpan.org/dist/WWW-Mechanize/

Have a read of:
http://www.perl.com/pub/2003/01/22/mechanize.html

Do a google search on "perl www::mechanize"


Apologies for the 'z' in "mechanize"

--
"If you're not careful, the newspapers will have you hating the people
who are being oppressed, and loving the people who are doing the
oppressing." --- Malcolm X


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120925140529.GU8247@tal
 
Old 09-25-2012, 02:25 PM
 
Default Using wget to fill in a form

> Have a look at:


> http://search.cpan.org/dist/WWW-Mechanize/
>
> Have a read of:
> http://www.perl.com/pub/2003/01/22/mechanize.html
>
> Do a google search on "perl www::mechanize"

*

Thanks for the reply (and to the other kind folks that took time

to reply). I will have to put this quest off until the weekend at this

point, so know that I am not ignoring the help, please.

*

Craig




Sent - Gtek Web Mail
 
Old 09-28-2012, 10:30 PM
Hendrik Boom
 
Default Using wget to fill in a form

On Sat, 22 Sep 2012 10:01:51 -0500, craig wrote:

> Greetings,
>
> I have a small book collection (~150) that I thought would be neat to
> catalog by the Library of Congress catalog numbers.

This isn't what you asked for at all, but you might consider the BLISS
classification instead. It's more modern, and its classification guides
are legitimately available for free download. Some are scanned PDFs,
others are available as source code (XML, I believe).

They've learned a lot about the structure of classification systems since
LC was set up.

It's used by a number of libraries in England, I believe.

-- hendrik


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/k458eg$l66$2@ger.gmane.org
 
Old 09-28-2012, 11:10 PM
John Hasler
 
Default Using wget to fill in a form

Hendrik Boom writes:
> It's more modern, and its classification guides are legitimately
> available for free download.

What about LCC is not in the public domain?

<http://www.loc.gov/catdir/cpso/lcco/>
--
John Hasler


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87r4plyc9b.fsf@thumper.dhh.gt.org">http://lists.debian.org/87r4plyc9b.fsf@thumper.dhh.gt.org
 

Thread Tools




All times are GMT. The time now is 10:03 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org