Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian User (http://www.linux-archive.org/debian-user/)
-   -   Using wget to fill in a form (http://www.linux-archive.org/debian-user/706382-using-wget-fill-form.html)

09-22-2012 03:01 PM

Using wget to fill in a form
 
Greetings,

I have a small book collection (~150) that I thought would be neat to catalog by the Library of Congress catalog numbers. I have found a LOC search form that will allow me to input the ISBN, and it will return the information I want:

Code:

http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090
I have the list of book ISBNs in a text file, so scripting this should be quite easy. The problem is I can't figure out how to submit the form from the command line. I figured wget would be the best way, but everything I try results in downloading a single line that reads "Your form didn't include an ACTION!" So I thought I would turn to here for help. The test ISBN I am using is for The Linux Cookbook: 1886411484, QA76.76.O63S788 2001.

And a related side question. From my reading, I've learned that the Z39.50 protocol is used to query databases, usually library related. Is anyone aware of an ISBN database table that can be downloaded by the user, preferably in a format that can be imported into MySQL or PostgreSQL?

Thanks, Craig


Sent - Gtek Web Mail



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1348326111.28614520@webmail.gtek.biz">http://lists.debian.org/1348326111.28614520@webmail.gtek.biz

Lars Noodén 09-22-2012 03:24 PM

Using wget to fill in a form
 
On 9/22/12 6:01 PM, craig@gtek.biz wrote:
[snip]
> And a related side question. From my reading, I've learned that the
> Z39.50 protocol is used to query databases, usually library related.
> Is anyone aware of an ISBN database table that can be downloaded by
> the user, preferably in a format that can be imported into MySQL or
> PostgreSQL?
[snip]

You could use Perl and ZOOM to make Z39.50 queries directly:

http://search.cpan.org/~mirk/Net-Z3950-ZOOM/lib/ZOOM.pod

For background see the Bath Profile:

http://www.ukoln.ac.uk/interop-focus/bath/

There are also bindings for C, C++ and PHP. You'll find them at
IndexData's web site.

As far as importing into MySQL or Postgresql, that is up to how you
decide to map the Bath Profile (most likely the one used) over to your
own database structure. The database being queried via Z39.50 probably
has its data in the MARC21 format and has over 1000 fields and subfields
each with a specific meaning.



Regards
/Lars


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 505DD82E.9020401@gmail.com">http://lists.debian.org/505DD82E.9020401@gmail.com

Gary Dale 09-22-2012 03:27 PM

Using wget to fill in a form
 
On 22/09/12 11:01 AM, craig@gtek.biz wrote:

Greetings,

I have a small book collection (~150) that I thought would be neat to catalog by the Library of Congress catalog numbers. I have found a LOC search form that will allow me to input the ISBN, and it will return the information I want:

Code:

http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090
I have the list of book ISBNs in a text file, so scripting this should be quite easy. The problem is I can't figure out how to submit the form from the command line. I figured wget would be the best way, but everything I try results in downloading a single line that reads "Your form didn't include an ACTION!" So I thought I would turn to here for help. The test ISBN I am using is for The Linux Cookbook: 1886411484, QA76.76.O63S788 2001.

And a related side question. From my reading, I've learned that the Z39.50 protocol is used to query databases, usually library related. Is anyone aware of an ISBN database table that can be downloaded by the user, preferably in a format that can be imported into MySQL or PostgreSQL?

Thanks, Craig

The url you give is for the form. If you enter an ISBN number it will do
the search.


What you need to do is capture the http header sent when you click
"submit query" then replace the test ISBN number with whatever number
you want to search. Wireshark can do this. Simply look for the query
packet(s).



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Archive: 505DD8C5.90802@rogers.com">http://lists.debian.org/505DD8C5.90802@rogers.com

Lars Noodén 09-22-2012 03:34 PM

Using wget to fill in a form
 
On 9/22/12 6:01 PM, craig@gtek.biz wrote:
> Greetings,
>
> I have a small book collection (~150) that I thought would be neat to
> catalog by the Library of Congress catalog numbers. I have found a
> LOC search form that will allow me to input the ISBN, and it will
> return the information I want:
>
>
Code:

http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090
>
> I have the list of book ISBNs in a text file, so scripting this
> should be quite easy. The problem is I can't figure out how to submit
> the form from the command line. I figured wget would be the best way,
> but everything I try results in downloading a single line that reads
> "Your form didn't include an ACTION!" So I thought I would turn to
> here for help. The test ISBN I am using is for The Linux Cookbook:
> 1886411484, QA76.76.O63S788 2001.
[snip]

If you want to screen scrape, the URI would be like this:

http://www.loc.gov/cgi-bin/zgate?ACTION=SEARCH&DBNAME=VOYAGER&ESNAME=B&MAXREC ORDS=20&RECSYNTAX=1.2.840.10003.5.10&REINIT=/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090&srchtype=1,1016,2, 102,3,3,4,2,5,100,6,1&SESSION_ID=4493330&TERM_1=18 86411484

However, the session ID expires after only a few minutes so you will
need a fresh one.

Regards,
/Lars


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 505DDA90.2040700@gmail.com">http://lists.debian.org/505DDA90.2040700@gmail.com

Gary Dale 09-22-2012 03:43 PM

Using wget to fill in a form
 
On 22/09/12 11:27 AM, Gary Dale wrote:

On 22/09/12 11:01 AM, craig@gtek.biz wrote:

Greetings,

I have a small book collection (~150) that I thought would be neat to
catalog by the Library of Congress catalog numbers. I have found a
LOC search form that will allow me to input the ISBN, and it will
return the information I want:


Code:

http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090


I have the list of book ISBNs in a text file, so scripting this
should be quite easy. The problem is I can't figure out how to submit
the form from the command line. I figured wget would be the best way,
but everything I try results in downloading a single line that reads
"Your form didn't include an ACTION!" So I thought I would turn to
here for help. The test ISBN I am using is for The Linux Cookbook:
1886411484, QA76.76.O63S788 2001.


And a related side question. From my reading, I've learned that the
Z39.50 protocol is used to query databases, usually library related.
Is anyone aware of an ISBN database table that can be downloaded by
the user, preferably in a format that can be imported into MySQL or
PostgreSQL?


Thanks, Craig

The url you give is for the form. If you enter an ISBN number it will
do the search.


What you need to do is capture the http header sent when you click
"submit query" then replace the test ISBN number with whatever number
you want to search. Wireshark can do this. Simply look for the query
packet(s).



The fields you need are shown in the page source:

<FORM METHOD="POST"ACTION="/cgi-bin/zgate">
<INPUT NAME="ACTION"VALUE="SEARCH"TYPE="HIDDEN">
<INPUT NAME="DBNAME"VALUE="VOYAGER"TYPE="HIDDEN">
<INPUT NAME="ESNAME"VALUE="B"TYPE="HIDDEN">
<INPUT NAME="MAXRECORDS"VALUE="20"TYPE="HIDDEN">
<INPUT NAME="RECSYNTAX"VALUE="1.2.840.10003.5.10"TYPE="HI DDEN">
<INPUT NAME="REINIT"TYPE="HIDDEN"VALUE="/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090">
<INPUT NAME="srchtype"VALUE="1,1016,2,102,3,3,4,2,5,100,6 ,1"TYPE="HIDDEN">

<P>
<STRONG>Enter Search Term(s):</STRONG><br>(The search term can be a single word or a phrase from anywhere in the record. Enter an author's name in indirect order, i.e., last_name, first_name.)<p>
<INPUT NAME="TERM_1"SIZE="60">
<p>
<INPUT TYPE="SUBMIT"VALUE="Submit Query">
<INPUT Type="RESET"VALUE="Clear Form">
<HR>
Use of this form results in a search of the LC Voyager database (approximately
14 million records). This database contains records in all bibliographic
formats (i.e., books, serials, music, maps, manuscripts, computer files, and
visual materials), and includes the retrospective, unedited older bibliographic
records known as the PreMARC File. LC name and subject authority records
cannot be searched.
<INPUT NAME="SESSION_ID"VALUE="5923056"TYPE="HIDDEN">
</FORM>


You need to construct the query using those fields with those values, with TERM_1 containing the ISBN number.

From the error you are getting, it seems like your query either didn't include the SEARCH action or the header wasn't understood.




--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Archive: 505DDC87.1080802@rogers.com">http://lists.debian.org/505DDC87.1080802@rogers.com

Camaleón 09-22-2012 04:00 PM

Using wget to fill in a form
 
On Sat, 22 Sep 2012 10:01:51 -0500, craig wrote:

> Greetings,
>
> I have a small book collection (~150) that I thought would be neat to
> catalog by the Library of Congress catalog numbers. I have found a LOC
> search form that will allow me to input the ISBN, and it will return the
> information I want:
>
>
Code:

http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090
>
> I have the list of book ISBNs in a text file, so scripting this should
> be quite easy. The problem is I can't figure out how to submit the form
> from the command line. I figured wget would be the best way, but
> everything I try results in downloading a single line that reads "Your
> form didn't include an ACTION!" So I thought I would turn to here for
> help. The test ISBN I am using is for The Linux Cookbook: 1886411484,
> QA76.76.O63S788 2001.

As others suggest, the query should be something like:

wget http://www.loc.gov/cgi-bin/zgate --post-data="ACTION=SEARCH&TERM_1=1886411484&SESSION_ID=1 234567"

But I get "session expired" :-(

(note the "SESSION_ID" field value is completely arbitrary in the above line)

> And a related side question. From my reading, I've learned that the
> Z39.50 protocol is used to query databases, usually library related. Is
> anyone aware of an ISBN database table that can be downloaded by the
> user, preferably in a format that can be imported into MySQL or
> PostgreSQL?

Well, according to this:

http://www.loc.gov/z3950/gateway.html#about

You can query the database by means of Z39.50 client, should you find one ;-)

Greteings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/k3knbc$mv$14@ger.gmane.org

09-22-2012 04:24 PM

Using wget to fill in a form
 
For background see the Bath Profile:

http://www.ukoln.ac.uk/interop-focus/bath/

There are also bindings for C, C++ and PHP. You'll find them at
IndexData's web site.

As far as importing into MySQL or Postgresql, that is up to how you
decide to map the Bath Profile (most likely the one used) over to your
own database structure. The database being queried via Z39.50 probably
has its data in the MARC21 format and has over 1000 fields and subfields
each with a specific meaning.

Thanks for the info. I didn't realize MARC21 was so complex, but I can always create queries that select what I need, I just need to know what to query against. I will read up on what you provided.


Sent - Gtek Web Mail



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1348331062.351710973@webmail.gtek.biz">http://lists.debian.org/1348331062.351710973@webmail.gtek.biz

09-22-2012 04:25 PM

Using wget to fill in a form
 
The url you give is for the form. If you enter an ISBN number it will do
the search.

What you need to do is capture the http header sent when you click
"submit query" then replace the test ISBN number with whatever number
you want to search. Wireshark can do this. Simply look for the query
packet(s).

At some point I thought about trying capture what was being submitted, but since my http protocol knowledge is limited I thought the information might also be being sent as a URL, which I figured would make wget perfect for this. I've got wireshark loaded on something around here, so I will investigate this line of thought. Thanks!


Sent - Gtek Web Mail



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1348331110.61631606@webmail.gtek.biz">http://lists.debian.org/1348331110.61631606@webmail.gtek.biz

09-22-2012 04:28 PM

Using wget to fill in a form
 
> As others suggest, the query should be something like:
>
> wget http://www.loc.gov/cgi-bin/zgate
> --post-data="ACTION=SEARCH&TERM_1=1886411484&SESSION_ID=1 234567"

Yeah, I was messing with the --post-data, but I didn't know I had to use an ACTION key. Will play with that.

> But I get "session expired" :-(
>
> (note the "SESSION_ID" field value is completely arbitrary in the above line)
>
>> And a related side question. From my reading, I've learned that the
>> Z39.50 protocol is used to query databases, usually library related. Is
>> anyone aware of an ISBN database table that can be downloaded by the
>> user, preferably in a format that can be imported into MySQL or
>> PostgreSQL?
>
> Well, according to this:
>
> http://www.loc.gov/z3950/gateway.html#about
>
> You can query the database by means of Z39.50 client, should you find one ;-)

I kind if figured that would be what I needed, but I'm not aware of any Z39.50 clients.

Thanks!


Sent - Gtek Web Mail



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1348331330.76872607@webmail.gtek.biz">http://lists.debian.org/1348331330.76872607@webmail.gtek.biz

Lars Noodén 09-22-2012 04:40 PM

Using wget to fill in a form
 
On 9/22/12 7:28 PM, craig@gtek.biz wrote:
[snip]
> I kind if figured that would be what I needed, but I'm not aware of any Z39.50 clients.
[snip]

Using ZOOM, mentioned in my previous post, you can use your perl script
as a Z39.50 client to search the LOC catalog directly. There are also
C, C++ and PHP bindings.

Regards,
/Lars


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 505DEA1A.8060502@gmail.com">http://lists.debian.org/505DEA1A.8060502@gmail.com


All times are GMT. The time now is 09:26 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.