FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 02-16-2009, 11:38 AM
Ray Parrish
 
Default Scripting Question

Hal Burgiss wrote:
> On Mon, Feb 16, 2009 at 3:11 AM, Steve Lamb <grey@dmiyu.org> wrote:
>
>> split(','). Do that once with a CSV with embedded commas and you go
>>
>
> In fairness to the rest of the scripting world, perl, php, ruby, etc
> all of which have very similar code:
>
> <?php
> $handle = fopen("test.csv", "r");
> while (($row = fgetcsv($handle, 1000, ",")) !== FALSE) {
> print $row[2] .',' . $row[3] ; //etc
> }
>
> Its also possible to have embedded quotes and embedded new lines.
>
Hello,

I've been following this thread, and I just have to ask. Why would any
good program which generated csv output allow any embedded commas in the
data? That seems kind of counter productive to me. I would think that
any program which generated csv output, would replace any commas within
the data with a space or some other character, so that the csv file
would work as intended.

Later, Ray Parrish

--
Human reviewed index of links about the computer
http://www.rayslinks.com
Poetry from the mind of a Schizophrenic
http://www.writingsoftheschizophrenic.com/


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-16-2009, 12:44 PM
Smoot Carl-Mitchell
 
Default Scripting Question

On Mon, 2009-02-16 at 04:38 -0800, Ray Parrish wrote:

> I've been following this thread, and I just have to ask. Why would any
> good program which generated csv output allow any embedded commas in the
> data? That seems kind of counter productive to me. I would think that
> any program which generated csv output, would replace any commas within
> the data with a space or some other character, so that the csv file
> would work as intended.

That is why text fields are quoted. A good CSV parser recognizes the
quoted strings. The Python CSV parser does this. There is also a good
CSV parsing module for Perl which also accounts for quoted fields.
--
Smoot Carl-Mitchell
Computer Systems and
Network Consultant
smoot@tic.com
+1 480 922 7313
cell: +1 602 421 9005

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-16-2009, 01:00 PM
Hal Burgiss
 
Default Scripting Question

On Mon, Feb 16, 2009 at 04:38:23AM -0800, Ray Parrish wrote:
>
> I've been following this thread, and I just have to ask. Why would any
> good program which generated csv output allow any embedded commas in the
> data? That seems kind of counter productive to me. I would think that
> any program which generated csv output, would replace any commas within
> the data with a space or some other character, so that the csv file
> would work as intended.

The question is how to protect an embedded delimiter, and I guess the
answer is that it has just evolved that way. There's actually an RFC
that defines the behavior. Wikipedia has a nice discussion of csv. All
this works quite well if using a language with csv support (python,
etc). I would think it would be difficult to handle all cases with
just traditional shell tools. Another unusual wrinkle is an embedded
double quote which is escaped as "".

--
Hal


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-16-2009, 02:43 PM
Derek Broughton
 
Default Scripting Question

Steve Lamb wrote:

> Derek Broughton wrote:
>> Hal Burgiss wrote:
>>> cut -d, -f2,4,57,59,61 $csvfile |grep @
>
>> Cute :-) There's always another, simpler, way...
>
> Except, as pointed out, when there are embedded commas in a field.
> That is why none of the people suggesting Python suggested using
> split(','). Do that once with a CSV with embedded commas and you go
> running to you local Python documentation to read up on the csv module.

Well, I _was_ one of the people who suggested python. Cut's a very good
solution if you can rely on your data not having embedded commas. I spent
a long time working on reading csv files for an application, and pretty
well _every_ solution other than python's csv module fell apart in this
case.


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-16-2009, 02:46 PM
Derek Broughton
 
Default Scripting Question

Robert Parker wrote:

> On Mon, Feb 16, 2009 at 3:11 PM, Steve Lamb <grey@dmiyu.org> wrote:
>> Derek Broughton wrote:
>>> Hal Burgiss wrote:
>>>> cut -d, -f2,4,57,59,61 $csvfile |grep @
>>
>>> Cute :-) There's always another, simpler, way...
>>
>> Except, as pointed out, when there are embedded commas in a field.
>> That is why none of the people suggesting Python suggested using
>> split(','). Do that once with a CSV with embedded commas and you go
>> running to you local Python documentation to read up on the csv module.
>
> Wouldn't that render a comma separated csv file useless no matter what you
> did? Afaik there is an option to separate using tabs at creation time.

They _are_ pretty nearly useless, which _is_ why they have tab-separated
files. But the original idea was that you could have commas in strings as
long as your strings were quote-delimited. Then you have the issue of
quotes in quotes strings, etc, etc...


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-20-2009, 07:11 AM
Patton Echols
 
Default Scripting Question

On 02/14/2009 01:36 AM, Matthew Flaschen wrote:
> H.S. wrote:
>
>> H.S. wrote:
>>
>>> Patton Echols wrote:
>>>
>>>> I have a fairly massive flat file, comma delimited, that I want to
>>>> extract info from. Specifically, I want to extract the first and last
>>>> name and email addresses for those who have them to a new file with just
>>>> that info. (The windows database program that this comes from simply
>>>> will not do it) I can grep the file for the @ symbol to at least
>>>> exclude the lines without an email address (or the @ symbol in the notes
>>>> field) But if I can figure this out, I can also adapt what I learn for
>>>> the next time. Can anyone point me in the right direction for my "light
>>>> reading?"
>>>>
>>>> By the way, I used 'head' to get the first line, with the field names.
>>>> This is the first of about 2300 records, the reason not to do it by hand.
>>>>
>>>> patton@laptop:~$ head -1 contacts.txt
>>>> "Business Title","First Name","Middle Name","Last Name","","Business
>>>> Company Name","","Business Title","Business Street 1","Business Street
>>>> 2","Business Street 3","Business City","Business State","Business
>>>> Zip","Business Country","Home Street 1","Home Street 2","Home Street
>>>> 3","Home City","Home State","Home Zip","Home Country","Other Street
>>>> 1","Other Street 2","Other Street 3","Other City","Other State","Other
>>>> Zip","Other Country","Assistant Phone","Business Fax Number","Business
>>>> Phone","Business 2 Phone","","Car Phone","","Home Fax Number","Home
>>>> Phone","Home 2 Phone","ISDN Phone","Mobile Phone","Other Fax
>>>> Number","Other Phone","Pager
>>>> Phone","","","","","","","","","","","","","Busine ss Email","","Home
>>>> Email","","Other
>>>> Email","","","","","","","","","","","","Notes","" ,"","","","","","","","","","","","","Business
>>>> Web Page"
>>>>
>>>>
>>>>
>>> Here is one crude method. Assume that the above long single line is in a
>>> file called test.db. Then the following bash command will output the
>>> Business Email from that file (this is one long command):
>>> $> cat test.db | sed -e 's/(.*Business Email"),"(.*)/2/g' | awk
>>> 'BEGIN { FS = """ } ; {print $1}'
>>>
>>> Similarly, the following gives the First name, Middle name and the Last
>>> name.
>>> $> cat test.db | sed -e 's/(^"Business Title"),"(.*)/2/g' | awk
>>> 'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"'
>>>
>>> Now, you can run this command on each line of your actual database file
>>> (using the bash while and read commands) and you should get the business
>>> email address and the names. If there is no email address, the output
>>> will be blank.
>>>
>>> Here is an untested set of commands to read each line from a file
>>> (full.db) to generate names and email:
>>> $> cat full.db | while read line; do
>>> echo "${line}" | sed -e 's/(^"Business Title"),"(.*)/2/g' |
>>> awk 'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"';
>>> echo "${line}" | sed -e 's/(.*Business Email"),"(.*)/2/g' |
>>> awk 'BEGIN { FS = """ } ; {print $1}'
>>> done
>>>
>>> But note that this is really a crude method. I am sure others can
>>> suggest more elegant ways to accomplish this. The above method will at
>>> least get you started.
>>>
>>> Warm regards.
>>>
>>>
>> More concise (given the order of data fields is constant) and probably
>> more efficient and better (the following is one long line):
>>
>> #---------------------------------------------
>> $> cat full.db | while read line; do echo "${line}" |awk 'BEGIN { FS =
>> "," }; {print $2, $3, $4, $58}' | tr -d '"'; done
>>
>
> There are a few issues. There's no need for cat, read line...done, tr,
> or echo (shell scripting is slow, especially when you fork multiple
> processes for every line). This didn't handle all the emails and that's
> the wrong field number. And it doesn't output in CSV format. Finally,
> the above prints every line, not only those with emails. So I get:
>
> gawk -F, '{ if ( match($57$59$61, "@") ) print
> $2","$4","$57","$59","$61};' contacts.txt>processed_contacts.txt
>
> That's all one line.
>
> Matt Flaschen
>
>
>
>
Thanks to everyone who responded to this. I really didn't plan to ask
the question and then drop off the face of the earth for a week, but
life happened. Matt's solution worked like a charm so I responded to
this one, but I learned something from all of the discussion and I
appreciate it.

As an aside, I manually cleaned out the few duplicate lines in the
result. I am going to read 'man gawk' to see if I could figure out how
to clean duplicates automatically.

--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-20-2009, 07:26 AM
Al Black
 
Default Scripting Question

On Fri, 2009-02-20 at 00:11 -0800, Patton Echols wrote:

> As an aside, I manually cleaned out the few duplicate lines in the
> result. I am going to read 'man gawk' to see if I could figure out how
> to clean duplicates automatically.

It was a good discussion, see: http://www.gnu.org/software/gawk/manual/

al


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-20-2009, 10:30 AM
Hal Burgiss
 
Default Scripting Question

On Fri, Feb 20, 2009 at 12:11:30AM -0800, Patton Echols wrote:
>
> As an aside, I manually cleaned out the few duplicate lines in the
> result. I am going to read 'man gawk' to see if I could figure out how
> to clean duplicates automatically.


If the entire line is duplicated ...

sort $file |uniq > $newfile

That will likely screw the header line so I would strip that first,
and then re-insert it.


--
Hal


--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 02-21-2009, 06:18 AM
Patton Echols
 
Default Scripting Question

On 02/20/2009 03:30 AM, Hal Burgiss wrote:
> On Fri, Feb 20, 2009 at 12:11:30AM -0800, Patton Echols wrote:
>
>> As an aside, I manually cleaned out the few duplicate lines in the
>> result. I am going to read 'man gawk' to see if I could figure out how
>> to clean duplicates automatically.
>>
>
>
> If the entire line is duplicated ...
>
> sort $file |uniq > $newfile
>
> That will likely screw the header line so I would strip that first,
> and then re-insert it.
>
>
>
Sure, good reminder. When I have this problem what I am usually doing
is working with combining multiple lists from different sources. So the
gawk command has the benefit of normalizing the results. But uniq,
only works sometimes because of the way people get entered in the first
place Bob Smith, Robert Smith, Rob Smith, you get the idea.

I'm kind of thinking that gawk could compare one field with the rest of
the file and delete records with a match then move to the next record.
Something like -- for records 1 to end, match field $N, if match delete
record, else next record . . . or something. Another post has a qawk
manual that has more explanation than the man page. I may be able to
figure that out . . .



--
ubuntu-users mailing list
ubuntu-users@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
 
Old 04-17-2012, 01:20 PM
"Chris"
 
Default Scripting question

All

Firstly I petty much suck at scripting so I need help.

I have a file where each line begins with

Smtp:

I would like have the Smtp: replaced with To: *leaving all that follows in each line untouched and piped into a new file.

Thanks!!
Chris
 

Thread Tools




All times are GMT. The time now is 09:27 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org