scripting / data problem.

Sun Nov 8 02:26:55 UTC 2009

On Sat, Nov 07, 2009 at 05:31:25PM -0800, Patton Echols wrote:

> file 1 - | email |
> file 2 - | email | f-name | l-name |
> file 3 - | f-name | l-name | company | phone |
> file 4 - | company | address | city | state |
> 
> (By the way, each one has some duplicates with less than complete info.)
> 
> What we need is to be able to populate all of these files with all the 
> available data for each contact.  The concept I have in mind is to 
> compare the files in pairs.  So file 1 would go into file-L  (for Last)  
> Then file 1 would be compared with each line of file-L and if there is a 
> record with matching email, then any extra info is added , but not 
> overwriting anything that is already in any field  So the final file 
> would look like this:
> 
> | file-1 | file-2 | file-3 | file-4 | email | f-name | l-name | company 
> | address | city | state | phone |
> 

[...]

> I also wondered whether I could create a mysql database with this info, 
> and there are commands that can merge records (rather than just 
> eliminate what looks like a duplicate). 

I'll vote for the mysql approach. Create a csv for each file, import that into
mysql so you have 4 tables. Then create a query  that will dump out the data
the way you want. You can dump that into another table or export it. For this
to work, the fields will have to match in each table. So Comany in file3 is
equivalent to Comany in file4 (not something like Acme, Inc vs Acme,
Incorporated). 

SELECT a.email, b.firstname, b.lastname, c.company, d.address FROM file1 a,
file2 b, file3 c, file4 d WHERE a.email = b.email and
concat(b.firstname,b.lastname) = concat(c.firstname,c.lastname) and c.company
= d.company INTO OUTFILE '/tmp/output.txt'   (crude and not tested).

Data manipulation is much more flexible once its in a database. At least when
you have something this squirrelly.

-- 
Hal