Using wget to fill in a form
Greetings,
I have a small book collection (~150) that I thought would be neat to catalog by the Library of Congress catalog numbers. I have found a LOC search form that will allow me to input the ISBN, and it will return the information I want: Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090And a related side question. From my reading, I've learned that the Z39.50 protocol is used to query databases, usually library related. Is anyone aware of an ISBN database table that can be downloaded by the user, preferably in a format that can be imported into MySQL or PostgreSQL? Thanks, Craig Sent - Gtek Web Mail -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 1348326111.28614520@webmail.gtek.biz">http://lists.debian.org/1348326111.28614520@webmail.gtek.biz |
Using wget to fill in a form
On 9/22/12 6:01 PM, craig@gtek.biz wrote:
[snip] > And a related side question. From my reading, I've learned that the > Z39.50 protocol is used to query databases, usually library related. > Is anyone aware of an ISBN database table that can be downloaded by > the user, preferably in a format that can be imported into MySQL or > PostgreSQL? [snip] You could use Perl and ZOOM to make Z39.50 queries directly: http://search.cpan.org/~mirk/Net-Z3950-ZOOM/lib/ZOOM.pod For background see the Bath Profile: http://www.ukoln.ac.uk/interop-focus/bath/ There are also bindings for C, C++ and PHP. You'll find them at IndexData's web site. As far as importing into MySQL or Postgresql, that is up to how you decide to map the Bath Profile (most likely the one used) over to your own database structure. The database being queried via Z39.50 probably has its data in the MARC21 format and has over 1000 fields and subfields each with a specific meaning. Regards /Lars -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 505DD82E.9020401@gmail.com">http://lists.debian.org/505DD82E.9020401@gmail.com |
Using wget to fill in a form
On 22/09/12 11:01 AM, craig@gtek.biz wrote:
Greetings, I have a small book collection (~150) that I thought would be neat to catalog by the Library of Congress catalog numbers. I have found a LOC search form that will allow me to input the ISBN, and it will return the information I want: Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090And a related side question. From my reading, I've learned that the Z39.50 protocol is used to query databases, usually library related. Is anyone aware of an ISBN database table that can be downloaded by the user, preferably in a format that can be imported into MySQL or PostgreSQL? Thanks, Craig The url you give is for the form. If you enter an ISBN number it will do the search. What you need to do is capture the http header sent when you click "submit query" then replace the test ISBN number with whatever number you want to search. Wireshark can do this. Simply look for the query packet(s). -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 505DD8C5.90802@rogers.com">http://lists.debian.org/505DD8C5.90802@rogers.com |
Using wget to fill in a form
On 9/22/12 6:01 PM, craig@gtek.biz wrote:
> Greetings, > > I have a small book collection (~150) that I thought would be neat to > catalog by the Library of Congress catalog numbers. I have found a > LOC search form that will allow me to input the ISBN, and it will > return the information I want: > > Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090> I have the list of book ISBNs in a text file, so scripting this > should be quite easy. The problem is I can't figure out how to submit > the form from the command line. I figured wget would be the best way, > but everything I try results in downloading a single line that reads > "Your form didn't include an ACTION!" So I thought I would turn to > here for help. The test ISBN I am using is for The Linux Cookbook: > 1886411484, QA76.76.O63S788 2001. [snip] If you want to screen scrape, the URI would be like this: http://www.loc.gov/cgi-bin/zgate?ACTION=SEARCH&DBNAME=VOYAGER&ESNAME=B&MAXREC ORDS=20&RECSYNTAX=1.2.840.10003.5.10&REINIT=/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090&srchtype=1,1016,2, 102,3,3,4,2,5,100,6,1&SESSION_ID=4493330&TERM_1=18 86411484 However, the session ID expires after only a few minutes so you will need a fresh one. Regards, /Lars -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 505DDA90.2040700@gmail.com">http://lists.debian.org/505DDA90.2040700@gmail.com |
Using wget to fill in a form
On 22/09/12 11:27 AM, Gary Dale wrote:
On 22/09/12 11:01 AM, craig@gtek.biz wrote: Greetings, I have a small book collection (~150) that I thought would be neat to catalog by the Library of Congress catalog numbers. I have found a LOC search form that will allow me to input the ISBN, and it will return the information I want: Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090I have the list of book ISBNs in a text file, so scripting this should be quite easy. The problem is I can't figure out how to submit the form from the command line. I figured wget would be the best way, but everything I try results in downloading a single line that reads "Your form didn't include an ACTION!" So I thought I would turn to here for help. The test ISBN I am using is for The Linux Cookbook: 1886411484, QA76.76.O63S788 2001. And a related side question. From my reading, I've learned that the Z39.50 protocol is used to query databases, usually library related. Is anyone aware of an ISBN database table that can be downloaded by the user, preferably in a format that can be imported into MySQL or PostgreSQL? Thanks, Craig The url you give is for the form. If you enter an ISBN number it will do the search. What you need to do is capture the http header sent when you click "submit query" then replace the test ISBN number with whatever number you want to search. Wireshark can do this. Simply look for the query packet(s). The fields you need are shown in the page source: <FORM METHOD="POST"ACTION="/cgi-bin/zgate"> <INPUT NAME="ACTION"VALUE="SEARCH"TYPE="HIDDEN"> <INPUT NAME="DBNAME"VALUE="VOYAGER"TYPE="HIDDEN"> <INPUT NAME="ESNAME"VALUE="B"TYPE="HIDDEN"> <INPUT NAME="MAXRECORDS"VALUE="20"TYPE="HIDDEN"> <INPUT NAME="RECSYNTAX"VALUE="1.2.840.10003.5.10"TYPE="HI DDEN"> <INPUT NAME="REINIT"TYPE="HIDDEN"VALUE="/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090"> <INPUT NAME="srchtype"VALUE="1,1016,2,102,3,3,4,2,5,100,6 ,1"TYPE="HIDDEN"> <P> <STRONG>Enter Search Term(s):</STRONG><br>(The search term can be a single word or a phrase from anywhere in the record. Enter an author's name in indirect order, i.e., last_name, first_name.)<p> <INPUT NAME="TERM_1"SIZE="60"> <p> <INPUT TYPE="SUBMIT"VALUE="Submit Query"> <INPUT Type="RESET"VALUE="Clear Form"> <HR> Use of this form results in a search of the LC Voyager database (approximately 14 million records). This database contains records in all bibliographic formats (i.e., books, serials, music, maps, manuscripts, computer files, and visual materials), and includes the retrospective, unedited older bibliographic records known as the PreMARC File. LC name and subject authority records cannot be searched. <INPUT NAME="SESSION_ID"VALUE="5923056"TYPE="HIDDEN"> </FORM> You need to construct the query using those fields with those values, with TERM_1 containing the ISBN number. From the error you are getting, it seems like your query either didn't include the SEARCH action or the header wasn't understood. -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 505DDC87.1080802@rogers.com">http://lists.debian.org/505DDC87.1080802@rogers.com |
Using wget to fill in a form
On Sat, 22 Sep 2012 10:01:51 -0500, craig wrote:
> Greetings, > > I have a small book collection (~150) that I thought would be neat to > catalog by the Library of Congress catalog numbers. I have found a LOC > search form that will allow me to input the ISBN, and it will return the > information I want: > > Code:
http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090> I have the list of book ISBNs in a text file, so scripting this should > be quite easy. The problem is I can't figure out how to submit the form > from the command line. I figured wget would be the best way, but > everything I try results in downloading a single line that reads "Your > form didn't include an ACTION!" So I thought I would turn to here for > help. The test ISBN I am using is for The Linux Cookbook: 1886411484, > QA76.76.O63S788 2001. As others suggest, the query should be something like: wget http://www.loc.gov/cgi-bin/zgate --post-data="ACTION=SEARCH&TERM_1=1886411484&SESSION_ID=1 234567" But I get "session expired" :-( (note the "SESSION_ID" field value is completely arbitrary in the above line) > And a related side question. From my reading, I've learned that the > Z39.50 protocol is used to query databases, usually library related. Is > anyone aware of an ISBN database table that can be downloaded by the > user, preferably in a format that can be imported into MySQL or > PostgreSQL? Well, according to this: http://www.loc.gov/z3950/gateway.html#about You can query the database by means of Z39.50 client, should you find one ;-) Greteings, -- Camaleón -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: http://lists.debian.org/k3knbc$mv$14@ger.gmane.org |
Using wget to fill in a form
For background see the Bath Profile:
http://www.ukoln.ac.uk/interop-focus/bath/ There are also bindings for C, C++ and PHP. You'll find them at IndexData's web site. As far as importing into MySQL or Postgresql, that is up to how you decide to map the Bath Profile (most likely the one used) over to your own database structure. The database being queried via Z39.50 probably has its data in the MARC21 format and has over 1000 fields and subfields each with a specific meaning. Thanks for the info. I didn't realize MARC21 was so complex, but I can always create queries that select what I need, I just need to know what to query against. I will read up on what you provided. Sent - Gtek Web Mail -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 1348331062.351710973@webmail.gtek.biz">http://lists.debian.org/1348331062.351710973@webmail.gtek.biz |
Using wget to fill in a form
The url you give is for the form. If you enter an ISBN number it will do
the search. What you need to do is capture the http header sent when you click "submit query" then replace the test ISBN number with whatever number you want to search. Wireshark can do this. Simply look for the query packet(s). At some point I thought about trying capture what was being submitted, but since my http protocol knowledge is limited I thought the information might also be being sent as a URL, which I figured would make wget perfect for this. I've got wireshark loaded on something around here, so I will investigate this line of thought. Thanks! Sent - Gtek Web Mail -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 1348331110.61631606@webmail.gtek.biz">http://lists.debian.org/1348331110.61631606@webmail.gtek.biz |
Using wget to fill in a form
> As others suggest, the query should be something like:
> > wget http://www.loc.gov/cgi-bin/zgate > --post-data="ACTION=SEARCH&TERM_1=1886411484&SESSION_ID=1 234567" Yeah, I was messing with the --post-data, but I didn't know I had to use an ACTION key. Will play with that. > But I get "session expired" :-( > > (note the "SESSION_ID" field value is completely arbitrary in the above line) > >> And a related side question. From my reading, I've learned that the >> Z39.50 protocol is used to query databases, usually library related. Is >> anyone aware of an ISBN database table that can be downloaded by the >> user, preferably in a format that can be imported into MySQL or >> PostgreSQL? > > Well, according to this: > > http://www.loc.gov/z3950/gateway.html#about > > You can query the database by means of Z39.50 client, should you find one ;-) I kind if figured that would be what I needed, but I'm not aware of any Z39.50 clients. Thanks! Sent - Gtek Web Mail -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 1348331330.76872607@webmail.gtek.biz">http://lists.debian.org/1348331330.76872607@webmail.gtek.biz |
Using wget to fill in a form
On 9/22/12 7:28 PM, craig@gtek.biz wrote:
[snip] > I kind if figured that would be what I needed, but I'm not aware of any Z39.50 clients. [snip] Using ZOOM, mentioned in my previous post, you can use your perl script as a Z39.50 client to search the LOC catalog directly. There are also C, C++ and PHP bindings. Regards, /Lars -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 505DEA1A.8060502@gmail.com">http://lists.debian.org/505DEA1A.8060502@gmail.com |
| All times are GMT. The time now is 10:50 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.