scripting / data problem.

Patton Echols p.echols at comcast.net
Mon Nov 9 07:41:34 UTC 2009


Thanks for the reply.


On 11/07/2009 06:26 PM, Hal Burgiss wrote:
> On Sat, Nov 07, 2009 at 05:31:25PM -0800, Patton Echols wrote:
>
>   
>> file 1 - | email |
>> file 2 - | email | f-name | l-name |
>> file 3 - | f-name | l-name | company | phone |
>> file 4 - | company | address | city | state |
>>
>> (By the way, each one has some duplicates with less than complete info.)
>>
>> What we need is to be able to populate all of these files with all the 
>> available data for each contact.  The concept I have in mind is to 
>> compare the files in pairs.  So file 1 would go into file-L  (for Last)  
>> Then file 1 would be compared with each line of file-L and if there is a 
>> record with matching email, then any extra info is added , but not 
>> overwriting anything that is already in any field  So the final file 
>> would look like this:
>>
>> | file-1 | file-2 | file-3 | file-4 | email | f-name | l-name | company 
>> | address | city | state | phone |
>>
>>     
>
> [...]
>
>   
>> I also wondered whether I could create a mysql database with this info, 
>> and there are commands that can merge records (rather than just 
>> eliminate what looks like a duplicate). 
>>     
>
> I'll vote for the mysql approach. Create a csv for each file, import that into
> mysql so you have 4 tables. Then create a query  that will dump out the data
> the way you want. You can dump that into another table or export it. For this
> to work, the fields will have to match in each table. So Comany in file3 is
> equivalent to Comany in file4 (not something like Acme, Inc vs Acme,
> Incorporated). 
>   

Do you mean that all the field data must match?  Or just, for example, 
the email fields?    Because part of the problem is that I have non 
matching / missing data depending on file.

> SELECT a.email, b.firstname, b.lastname, c.company, d.address FROM file1 a,
> file2 b, file3 c, file4 d WHERE a.email = b.email and
> concat(b.firstname,b.lastname) = concat(c.firstname,c.lastname) and c.company
> = d.company INTO OUTFILE '/tmp/output.txt'   (crude and not tested).
>
> Data manipulation is much more flexible once its in a database. At least when
> you have something this squirrelly.
>
>   

Does the concat commands mean that there would no longer be a separate 
first name and last name?


Also, it looks like the "=" is a requirement.  Is that what you meant by 
the fields having to be the same?  Or could I arrange for it to have 
file 1 overwrite file 2 for certain fields.  Or only blank fields?




More information about the ubuntu-users mailing list