FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 02-14-2009, 07:46 AM
Chris Mohler
 
Default Scripting Question

On Sat, Feb 14, 2009 at 9:52 PM, Patton Echols <p.echols@comcast.net> wrote:
> On 02/13/2009 06:41 PM, Chris Mohler wrote:
>> On Sat, Feb 14, 2009 at 8:18 PM, Patton Echols <p.echols@comcast.net> wrote:
>>
>>> I have a fairly massive flat file, comma delimited, that I want to
>>> extract info from. Specifically, I want to extract the first and last
>>> name and email addresses for those who have them to a new file with just
>>> that info. (The windows database program that this comes from simply
>>> will not do it) I can grep the file for the @ symbol to at least
>>> exclude the lines without an email address (or the @ symbol in the notes
>>> field) But if I can figure this out, I can also adapt what I learn for
>>> the next time. Can anyone point me in the right direction for my "light
>>> reading?"
>>>
>>
>> Maybe this will help (a good start anyway):
>> #===========================
>> #!/usr/bin/env python
>>
>> import csv
>>
>> # Open CSV of ZIP code data
>> file = open("your filename here", 'r')
>> csv = csv.reader(file)
>>
>> for row in csv:
>> do something....
>> #=======================
>>
>> if you replace "do something" with "print row[0[", it will print the
>> first column, "print row[1]" the second column - you get the idea
>>
>> If you get an error about csv - check that the python-csv package is
>> installed...
>>
>> Chris
>>
>>
> Is there a place where I can find the syntax for such a thing?
>
> I like the idea of having "do something" be: pass one, print column 3,
> column 4, column 12, column 13 and then on pass two, print the rows
> where col 3 and 4 of the result have email addresses.

OK something like:
#===========================
#!/usr/bin/env python

import csv

file = open("your filename here", 'r')
csv = csv.reader(file)

i = 0
for row in csv:
if ( i == 0):
print row[2], row[3], row [11], row[12]
else:
print row[2], [3]
i = i + 1
#=======================

If you want to match email addresses only, 'import re' and then use a
regex (eg: "if re.match") on the column(s). Python is pretty
user-friendly - and there are a lot of tutorials out there...

Of course, I'm biased
http://xkcd.com/353/

Chris

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-14-2009, 08:36 AM
Matthew Flaschen
 
Default Scripting Question

H.S. wrote:
> H.S. wrote:
>> Patton Echols wrote:
>>> I have a fairly massive flat file, comma delimited, that I want to
>>> extract info from. Specifically, I want to extract the first and last
>>> name and email addresses for those who have them to a new file with just
>>> that info. (The windows database program that this comes from simply
>>> will not do it) I can grep the file for the @ symbol to at least
>>> exclude the lines without an email address (or the @ symbol in the notes
>>> field) But if I can figure this out, I can also adapt what I learn for
>>> the next time. Can anyone point me in the right direction for my "light
>>> reading?"
>>>
>>> By the way, I used 'head' to get the first line, with the field names.
>>> This is the first of about 2300 records, the reason not to do it by hand.
>>>
>>> patton@laptop:~$ head -1 contacts.txt
>>> "Business Title","First Name","Middle Name","Last Name","","Business
>>> Company Name","","Business Title","Business Street 1","Business Street
>>> 2","Business Street 3","Business City","Business State","Business
>>> Zip","Business Country","Home Street 1","Home Street 2","Home Street
>>> 3","Home City","Home State","Home Zip","Home Country","Other Street
>>> 1","Other Street 2","Other Street 3","Other City","Other State","Other
>>> Zip","Other Country","Assistant Phone","Business Fax Number","Business
>>> Phone","Business 2 Phone","","Car Phone","","Home Fax Number","Home
>>> Phone","Home 2 Phone","ISDN Phone","Mobile Phone","Other Fax
>>> Number","Other Phone","Pager
>>> Phone","","","","","","","","","","","","","Busine ss Email","","Home
>>> Email","","Other
>>> Email","","","","","","","","","","","","Notes","" ,"","","","","","","","","","","","","Business
>>> Web Page"
>>>
>>>
>> Here is one crude method. Assume that the above long single line is in a
>> file called test.db. Then the following bash command will output the
>> Business Email from that file (this is one long command):
>> $> cat test.db | sed -e 's/(.*Business Email"),"(.*)/2/g' | awk
>> 'BEGIN { FS = """ } ; {print $1}'
>>
>> Similarly, the following gives the First name, Middle name and the Last
>> name.
>> $> cat test.db | sed -e 's/(^"Business Title"),"(.*)/2/g' | awk
>> 'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"'
>>
>> Now, you can run this command on each line of your actual database file
>> (using the bash while and read commands) and you should get the business
>> email address and the names. If there is no email address, the output
>> will be blank.
>>
>> Here is an untested set of commands to read each line from a file
>> (full.db) to generate names and email:
>> $> cat full.db | while read line; do
>> echo "${line}" | sed -e 's/(^"Business Title"),"(.*)/2/g' |
>> awk 'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"';
>> echo "${line}" | sed -e 's/(.*Business Email"),"(.*)/2/g' |
>> awk 'BEGIN { FS = """ } ; {print $1}'
>> done
>>
>> But note that this is really a crude method. I am sure others can
>> suggest more elegant ways to accomplish this. The above method will at
>> least get you started.
>>
>> Warm regards.
>>
>
> More concise (given the order of data fields is constant) and probably
> more efficient and better (the following is one long line):
>
> #---------------------------------------------
> $> cat full.db | while read line; do echo "${line}" |awk 'BEGIN { FS =
> "," }; {print $2, $3, $4, $58}' | tr -d '"'; done

There are a few issues. There's no need for cat, read line...done, tr,
or echo (shell scripting is slow, especially when you fork multiple
processes for every line). This didn't handle all the emails and that's
the wrong field number. And it doesn't output in CSV format. Finally,
the above prints every line, not only those with emails. So I get:

gawk -F, '{ if ( match($57$59$61, "@") ) print
$2","$4","$57","$59","$61};' contacts.txt>processed_contacts.txt

That's all one line.

Matt Flaschen



--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-14-2009, 02:32 PM
"H. S."
 
Default Scripting Question

On Sat, Feb 14, 2009 at 4:36 AM, Matthew Flaschen <matthew.flaschen@gatech.edu> wrote:

H.S. wrote:



>

> More concise (given the order of data fields is constant) and probably

> more efficient and better (the following is one long line):

>

> #---------------------------------------------

> $> cat full.db | while read line; do echo "${line}" |awk 'BEGIN { FS =

> "," }; {print $2, $3, $4, *$58}' | tr -d '"'; done



There are a few issues. *There's no need for cat, read line...done, tr,

or echo (shell scripting is slow, especially when you fork multiple

Ah, right.

*


processes for every line). *This didn't handle all the emails and that's

the wrong field number. *And it doesn't output in CSV format. *Finally,


I ran this command on a file where I pasted copies of* the line OP posted and got the name parts and the email. I did not search for more emails neither did I check for empty emails.

*
the above prints every line, not only those with emails. *So I get:



gawk -F, '{ if ( match($57$59$61, "@") ) print

$2","$4","$57","$59","$61};' contacts.txt>processed_contacts.txt



That's all one line.


Very nice. Thanks a ton!

Regards.

*

Matt Flaschen







--

ubuntu-users mailing list

ubuntu-users@lists.ubuntu.com

Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users



--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-14-2009, 02:52 PM
Derek Broughton
 
Default Scripting Question

Patton Echols wrote:

> I have a fairly massive flat file, comma delimited, that I want to
> extract info from. Specifically, I want to extract the first and last
> name and email addresses for those who have them to a new file with just
> that info. (The windows database program that this comes from simply
> will not do it) I can grep the file for the @ symbol to at least
> exclude the lines without an email address (or the @ symbol in the notes
> field) But if I can figure this out, I can also adapt what I learn for
> the next time. Can anyone point me in the right direction for my "light
> reading?"

Is this somthing that really needs to be scripted? If you do it once a
year, I'd open the file in OpenOffice, and resave just what I want.

Otherwise, I'd use python and the cvs module :-)


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-14-2009, 04:55 PM
Hal Burgiss
 
Default Scripting Question

On Fri, Feb 13, 2009 at 9:18 PM, Patton Echols <p.echols@comcast.net> wrote:
> I have a fairly massive flat file, comma delimited, that I want to
> extract info from. Specifically, I want to extract the first and last

I don't see that anyone has mentioned 'cut':

cut -d, -f2,4,57,59,61 $csvfile |grep @

--
Hal

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-14-2009, 08:43 PM
Hal Burgiss
 
Default Scripting Question

On Sat, Feb 14, 2009 at 12:55 PM, Hal Burgiss <hal@burgiss.net> wrote:
> On Fri, Feb 13, 2009 at 9:18 PM, Patton Echols <p.echols@comcast.net> wrote:
>> I have a fairly massive flat file, comma delimited, that I want to
>> extract info from. Specifically, I want to extract the first and last
>
> I don't see that anyone has mentioned 'cut':
>
> cut -d, -f2,4,57,59,61 $csvfile |grep @

Not a good idea if there are embedded commas like ...

12330,,,"Volmer, Robert",,"Rob.Volmer@gardra.com","NY",,,,,,,,,,,,


--
Hal

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-14-2009, 09:28 PM
Derek Broughton
 
Default Scripting Question

Hal Burgiss wrote:

> On Fri, Feb 13, 2009 at 9:18 PM, Patton Echols <p.echols@comcast.net>
> wrote:
>> I have a fairly massive flat file, comma delimited, that I want to
>> extract info from. Specifically, I want to extract the first and last
>
> I don't see that anyone has mentioned 'cut':
>
> cut -d, -f2,4,57,59,61 $csvfile |grep @
>
Cute :-) There's always another, simpler, way...


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-16-2009, 07:11 AM
Steve Lamb
 
Default Scripting Question

Derek Broughton wrote:
> Hal Burgiss wrote:
>> cut -d, -f2,4,57,59,61 $csvfile |grep @

> Cute :-) There's always another, simpler, way...

Except, as pointed out, when there are embedded commas in a field.
That is why none of the people suggesting Python suggested using
split(','). Do that once with a CSV with embedded commas and you go
running to you local Python documentation to read up on the csv module.


--
Steve C. Lamb | But who can decide what they dream
PGP Key: 1FC01004 | and dream I do
-------------------------------+---------------------------------------------

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-16-2009, 07:39 AM
Robert Parker
 
Default Scripting Question

On Mon, Feb 16, 2009 at 3:11 PM, Steve Lamb <grey@dmiyu.org> wrote:
> Derek Broughton wrote:
>> Hal Burgiss wrote:
>>> cut -d, -f2,4,57,59,61 $csvfile |grep @
>
>> Cute :-) There's always another, simpler, way...
>
> Except, as pointed out, when there are embedded commas in a field.
> That is why none of the people suggesting Python suggested using
> split(','). Do that once with a CSV with embedded commas and you go
> running to you local Python documentation to read up on the csv module.

Wouldn't that render a comma separated csv file useless no matter what you did?
Afaik there is an option to separate using tabs at creation time.

Bob Parker
--
In a world without walls who needs Windows (or Gates)? Try Linux instead!

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-16-2009, 10:36 AM
Hal Burgiss
 
Default Scripting Question

On Mon, Feb 16, 2009 at 3:11 AM, Steve Lamb <grey@dmiyu.org> wrote:
> split(','). Do that once with a CSV with embedded commas and you go

In fairness to the rest of the scripting world, perl, php, ruby, etc
all of which have very similar code:

<?php
$handle = fopen("test.csv", "r");
while (($row = fgetcsv($handle, 1000, ",")) !== FALSE) {
print $row[2] .',' . $row[3] ; //etc
}

Its also possible to have embedded quotes and embedded new lines.

--
Hal

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 

Thread Tools




All times are GMT. The time now is 10:11 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org