Scripting Question

Matthew Flaschen matthew.flaschen at gatech.edu
Sat Feb 14 09:36:36 UTC 2009


H.S. wrote:
> H.S. wrote:
>> Patton Echols wrote:
>>> I have a fairly massive flat file, comma delimited, that I want to 
>>> extract info from.  Specifically, I want to extract the first and last 
>>> name and email addresses for those who have them to a new file with just 
>>> that info. (The windows database program that this comes from simply 
>>> will not do it)  I can grep the file for the @ symbol to at least 
>>> exclude the lines without an email address (or the @ symbol in the notes 
>>> field)  But if I can figure this out, I can also adapt what I learn for 
>>> the next time.  Can anyone point me in the right direction for my "light 
>>> reading?"
>>>
>>> By the way, I used 'head' to get the first line, with the field names.  
>>> This is the first of about 2300 records, the reason not to do it by hand.
>>>
>>> patton at laptop:~$ head -1 contacts.txt
>>> "Business Title","First Name","Middle Name","Last Name","","Business 
>>> Company Name","","Business Title","Business Street 1","Business Street 
>>> 2","Business Street 3","Business City","Business State","Business 
>>> Zip","Business Country","Home Street 1","Home Street 2","Home Street 
>>> 3","Home City","Home State","Home Zip","Home Country","Other Street 
>>> 1","Other Street 2","Other Street 3","Other City","Other State","Other 
>>> Zip","Other Country","Assistant Phone","Business Fax Number","Business 
>>> Phone","Business 2 Phone","","Car Phone","","Home Fax Number","Home 
>>> Phone","Home 2 Phone","ISDN Phone","Mobile Phone","Other Fax 
>>> Number","Other Phone","Pager 
>>> Phone","","","","","","","","","","","","","Business Email","","Home 
>>> Email","","Other 
>>> Email","","","","","","","","","","","","Notes","","","","","","","","","","","","","","Business 
>>> Web Page"
>>>
>>>
>> Here is one crude method. Assume that the above long single line is in a
>> file called test.db. Then the following bash command will output the
>> Business Email from that file (this is one long command):
>> $> cat test.db  | sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' | awk
>> 'BEGIN { FS = "\"" } ; {print $1}'
>>
>> Similarly, the following gives the First name, Middle name and the Last
>> name.
>> $> cat test.db  | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' | awk
>> 'BEGIN { FS = "," } ; {print $1, $2, $3}'  | tr -d '"'
>>
>> Now, you can run this command on each line of your actual database file
>> (using the bash while and read commands) and you should get the business
>> email address and the names. If there is no email address, the output
>> will be blank.
>>
>> Here is an untested set of commands to read each line from a file
>> (full.db) to generate names and email:
>> $> cat full.db | while read line; do
>>     echo "${line}" | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' |
>> awk 'BEGIN { FS = "," } ; {print $1, $2, $3}'  | tr -d '"';
>>     echo "${line}" |  sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' |
>> awk 'BEGIN { FS = "\"" } ; {print $1}'
>> done
>>
>> But note that this is really a crude method. I am sure others can
>> suggest more elegant ways to accomplish this. The above method will at
>> least get you started.
>>
>> Warm regards.
>>
> 
> More concise (given the order of data fields is constant) and probably
> more efficient and better (the following is one long line):
> 
> #---------------------------------------------
> $> cat full.db | while read line; do echo "${line}" |awk 'BEGIN { FS =
> "," }; {print $2, $3, $4,  $58}' | tr -d '"'; done

There are a few issues.  There's no need for cat, read line...done, tr,
or echo (shell scripting is slow, especially when you fork multiple
processes for every line).  This didn't handle all the emails and that's
the wrong field number.  And it doesn't output in CSV format.  Finally,
the above prints every line, not only those with emails.  So I get:

gawk -F, '{ if ( match($57$59$61, "@") ) print
$2","$4","$57","$59","$61};' contacts.txt>processed_contacts.txt

That's all one line.

Matt Flaschen






More information about the ubuntu-users mailing list