Scripting Question

H.S. hs.samix at gmail.com
Sat Feb 14 04:41:39 UTC 2009


H.S. wrote:
> Patton Echols wrote:
>> I have a fairly massive flat file, comma delimited, that I want to 
>> extract info from.  Specifically, I want to extract the first and last 
>> name and email addresses for those who have them to a new file with just 
>> that info. (The windows database program that this comes from simply 
>> will not do it)  I can grep the file for the @ symbol to at least 
>> exclude the lines without an email address (or the @ symbol in the notes 
>> field)  But if I can figure this out, I can also adapt what I learn for 
>> the next time.  Can anyone point me in the right direction for my "light 
>> reading?"
>>
>> By the way, I used 'head' to get the first line, with the field names.  
>> This is the first of about 2300 records, the reason not to do it by hand.
>>
>> patton at laptop:~$ head -1 contacts.txt
>> "Business Title","First Name","Middle Name","Last Name","","Business 
>> Company Name","","Business Title","Business Street 1","Business Street 
>> 2","Business Street 3","Business City","Business State","Business 
>> Zip","Business Country","Home Street 1","Home Street 2","Home Street 
>> 3","Home City","Home State","Home Zip","Home Country","Other Street 
>> 1","Other Street 2","Other Street 3","Other City","Other State","Other 
>> Zip","Other Country","Assistant Phone","Business Fax Number","Business 
>> Phone","Business 2 Phone","","Car Phone","","Home Fax Number","Home 
>> Phone","Home 2 Phone","ISDN Phone","Mobile Phone","Other Fax 
>> Number","Other Phone","Pager 
>> Phone","","","","","","","","","","","","","Business Email","","Home 
>> Email","","Other 
>> Email","","","","","","","","","","","","Notes","","","","","","","","","","","","","","Business 
>> Web Page"
>>
>>
> 
> Here is one crude method. Assume that the above long single line is in a
> file called test.db. Then the following bash command will output the
> Business Email from that file (this is one long command):
> $> cat test.db  | sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' | awk
> 'BEGIN { FS = "\"" } ; {print $1}'
> 
> Similarly, the following gives the First name, Middle name and the Last
> name.
> $> cat test.db  | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' | awk
> 'BEGIN { FS = "," } ; {print $1, $2, $3}'  | tr -d '"'
> 
> Now, you can run this command on each line of your actual database file
> (using the bash while and read commands) and you should get the business
> email address and the names. If there is no email address, the output
> will be blank.
> 
> Here is an untested set of commands to read each line from a file
> (full.db) to generate names and email:
> $> cat full.db | while read line; do
>     echo "${line}" | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' |
> awk 'BEGIN { FS = "," } ; {print $1, $2, $3}'  | tr -d '"';
>     echo "${line}" |  sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' |
> awk 'BEGIN { FS = "\"" } ; {print $1}'
> done
> 
> But note that this is really a crude method. I am sure others can
> suggest more elegant ways to accomplish this. The above method will at
> least get you started.
> 
> Warm regards.
> 

More concise (given the order of data fields is constant) and probably
more efficient and better (the following is one long line):

#---------------------------------------------
$> cat full.db | while read line; do echo "${line}" |awk 'BEGIN { FS =
"," }; {print $2, $3, $4,  $58}' | tr -d '"'; done
#---------------------------------------------

assuming your database file is called full.db

Hope that helps.
Regards,
->HS
-- 

Please reply to this list only. I read this list on its corresponding
newsgroup on gmane.org. Replies sent to my email address are just
filtered to a folder in my mailbox and get periodically deleted without
ever having been read.





More information about the ubuntu-users mailing list