Scripting Question
H.S.
hs.samix at gmail.com
Sat Feb 14 04:24:09 UTC 2009
Patton Echols wrote:
> I have a fairly massive flat file, comma delimited, that I want to
> extract info from. Specifically, I want to extract the first and last
> name and email addresses for those who have them to a new file with just
> that info. (The windows database program that this comes from simply
> will not do it) I can grep the file for the @ symbol to at least
> exclude the lines without an email address (or the @ symbol in the notes
> field) But if I can figure this out, I can also adapt what I learn for
> the next time. Can anyone point me in the right direction for my "light
> reading?"
>
> By the way, I used 'head' to get the first line, with the field names.
> This is the first of about 2300 records, the reason not to do it by hand.
>
> patton at laptop:~$ head -1 contacts.txt
> "Business Title","First Name","Middle Name","Last Name","","Business
> Company Name","","Business Title","Business Street 1","Business Street
> 2","Business Street 3","Business City","Business State","Business
> Zip","Business Country","Home Street 1","Home Street 2","Home Street
> 3","Home City","Home State","Home Zip","Home Country","Other Street
> 1","Other Street 2","Other Street 3","Other City","Other State","Other
> Zip","Other Country","Assistant Phone","Business Fax Number","Business
> Phone","Business 2 Phone","","Car Phone","","Home Fax Number","Home
> Phone","Home 2 Phone","ISDN Phone","Mobile Phone","Other Fax
> Number","Other Phone","Pager
> Phone","","","","","","","","","","","","","Business Email","","Home
> Email","","Other
> Email","","","","","","","","","","","","Notes","","","","","","","","","","","","","","Business
> Web Page"
>
>
Here is one crude method. Assume that the above long single line is in a
file called test.db. Then the following bash command will output the
Business Email from that file (this is one long command):
$> cat test.db | sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' | awk
'BEGIN { FS = "\"" } ; {print $1}'
Similarly, the following gives the First name, Middle name and the Last
name.
$> cat test.db | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' | awk
'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"'
Now, you can run this command on each line of your actual database file
(using the bash while and read commands) and you should get the business
email address and the names. If there is no email address, the output
will be blank.
Here is an untested set of commands to read each line from a file
(full.db) to generate names and email:
$> cat full.db | while read line; do
echo "${line}" | sed -e 's/\(^"Business Title\"\),"\(.*\)/\2/g' |
awk 'BEGIN { FS = "," } ; {print $1, $2, $3}' | tr -d '"';
echo "${line}" | sed -e 's/\(.*Business Email\"\),"\(.*\)/\2/g' |
awk 'BEGIN { FS = "\"" } ; {print $1}'
done
But note that this is really a crude method. I am sure others can
suggest more elegant ways to accomplish this. The above method will at
least get you started.
Warm regards.
--
Please reply to this list only. I read this list on its corresponding
newsgroup on gmane.org. Replies sent to my email address are just
filtered to a folder in my mailbox and get periodically deleted without
ever having been read.
More information about the ubuntu-users
mailing list