Scripting question -- duplicate record problem
Patton Echols
p.echols at comcast.net
Fri Dec 10 00:05:12 UTC 2010
I am working on a script to clean up the export of a proprietary db
for gmail contacts import. The only way to get info out of the db is to
export to a flat file.
I have been able to get the export to be only the fields I want, but
there are problems with the data.
I am extracting only the records with email addresses by using the
following:
gawk -F, '{ if ( match($4, "@") ) print };' gmail-export.txt >
gmail-export.csv
But some of the records are duplicates, but not identical duplicates.
Most of the issues are where a duplicate is created with different
formatting of the phone numbers. Eg. (123) 456-7890 is different from
1234567890.
Can anyone suggest a way to have the script throw out the second record
based only on two fields (First name and last name)?
I confess I am stuck thinking of this as an awk problem and my
creativity is shot!
Thanks for any thoughts.
--PE
More information about the ubuntu-users
mailing list