Scripting Question
H.S.
hs.samix at gmail.com
Sat Feb 14 04:41:39 UTC 2009
H.S. wrote:
> Patton Echols wrote:
>> I have a fairly massive flat file, comma delimited, that I want to
>> extract info from. Specifically, I want to extract the first and last
>> name and email addresses for those who have them to a new file with just
>> that info. (The windows database program that this comes from simply
>> will not do it) I can grep the file for the @ symbol to at least
>> exclude the lines without an email address (or the @ symbol in the notes
>> field) But if I can figure this out, I can also adapt what I learn for
>> the next time. Can anyone point me in the right direction for my "light
>> reading?"
>>
>> By the way, I used 'head' to get the first line, with the field names.
>> This is the first of about 2300 records, the reason not to do it by hand.
>>
>> patton at laptop:~$ head -1 contacts.txt
>> "Business Title","First Name","Middle Name","Last Name","","Business
>> Company Name","","Business Title","Business Street 1","Business Street
>> 2","Business Street 3","Business City","Business State","Business
>> Zip","Business Country","Home Street 1","Home Street 2","Home Street
>> 3","Home City","Home State","Home Zip","Home Country","Other Street
>> 1","Other Street 2","Other Street 3","Other City","Other State","Other
>> Zip","Other Country","Assistant Phone","Business Fax Number","Business
>> Phone","Business 2 Phone","","Car Phone","","Home Fax Number","Home
>> Phone","Home 2 Phone","ISDN Phone","Mobile Phone","Other Fax
>> Number","Other Phone","Pager
>> Phone","","","","","","","","","","","","","Business Email","","Home
>> Email","","Other
>> Email","","","","","","","","","","","","Notes","","","","","","","","","","","","","","Business
>> Web Page"
>>
>>
>
> Here is one crude method. Assume that the above long single line is in a
> file called test.db. Then the following bash command will output the
> Business Email from that file (this is one long command):
> $> cat test.db | sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' | awk
> 'BEGIN { FS = "\"" } ; {print $1}'
>
> Similarly, the following gives the First name, Middle name and the Last
> name.
> $> cat test.db | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' | awk
> 'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"'
>
> Now, you can run this command on each line of your actual database file
> (using the bash while and read commands) and you should get the business
> email address and the names. If there is no email address, the output
> will be blank.
>
> Here is an untested set of commands to read each line from a file
> (full.db) to generate names and email:
> $> cat full.db | while read line; do
> echo "${line}" | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' |
> awk 'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"';
> echo "${line}" | sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' |
> awk 'BEGIN { FS = "\"" } ; {print $1}'
> done
>
> But note that this is really a crude method. I am sure others can
> suggest more elegant ways to accomplish this. The above method will at
> least get you started.
>
> Warm regards.
>
More concise (given the order of data fields is constant) and probably
more efficient and better (the following is one long line):
#---------------------------------------------
$> cat full.db | while read line; do echo "${line}" |awk 'BEGIN { FS =
"," }; {print $2, $3, $4, $58}' | tr -d '"'; done
#---------------------------------------------
assuming your database file is called full.db
Hope that helps.
Regards,
->HS
--
Please reply to this list only. I read this list on its corresponding
newsgroup on gmane.org. Replies sent to my email address are just
filtered to a folder in my mailbox and get periodically deleted without
ever having been read.
More information about the ubuntu-users
mailing list