scripting / data problem.
Hal Burgiss
hal at burgiss.net
Mon Nov 9 11:31:28 UTC 2009
On Sun, Nov 08, 2009 at 11:41:34PM -0800, Patton Echols wrote:
> >
> >> file 1 - | email |
> >> file 2 - | email | f-name | l-name |
> >> file 3 - | f-name | l-name | company | phone |
> >> file 4 - | company | address | city | state |
> >>
> >> (By the way, each one has some duplicates with less than complete info.)
> >>
> >> What we need is to be able to populate all of these files with all the
> >> available data for each contact. The concept I have in mind is to
> >> compare the files in pairs. So file 1 would go into file-L (for Last)
> >> Then file 1 would be compared with each line of file-L and if there is a
> >> record with matching email, then any extra info is added , but not
> >> overwriting anything that is already in any field So the final file
> >> would look like this:
[...]
> > I'll vote for the mysql approach. Create a csv for each file, import that into
> > mysql so you have 4 tables. Then create a query that will dump out the data
> > the way you want. You can dump that into another table or export it. For this
> > to work, the fields will have to match in each table. So Comany in file3 is
> > equivalent to Comany in file4 (not something like Acme, Inc vs Acme,
> > Incorporated).
> >
>
> Do you mean that all the field data must match? Or just, for example,
> the email fields? Because part of the problem is that I have non
> matching / missing data depending on file.
In order for the relations to work across all 4 files, the fields that are
being keyed on, *should* match exactly. You can get around the missing data by
using a 'left join' syntax.
> > SELECT a.email, b.firstname, b.lastname, c.company, d.address FROM file1 a,
> > file2 b, file3 c, file4 d WHERE a.email = b.email and
> > concat(b.firstname,b.lastname) = concat(c.firstname,c.lastname) and c.company
> > = d.company INTO OUTFILE '/tmp/output.txt' (crude and not tested).
> >
> > Data manipulation is much more flexible once its in a database. At least when
> > you have something this squirrelly.
> Does the concat commands mean that there would no longer be a separate
> first name and last name?
No, that's just part of the query that contstructs a relationship across all
four tables. The output part is the 'b.firstname'. The concat thing is just
one way to make sure the columns you are trying to match, actually do match.
This will be problematic if one table has 'John Doe' and the other has
'Johnathon Q. Doe, III' type of stuff.
>
> Also, it looks like the "=" is a requirement. Is that what you meant by
> the fields having to be the same? Or could I arrange for it to have
> file 1 overwrite file 2 for certain fields. Or only blank fields?
An "equivalency" is a lot cleaner for the query part anyway (where you are
matching data in one table against another). But you can do anything you want
with the output part. So, instead of using b.firstname, you could use
c.firstname instead. They would both be in their respective tables (a, b, c,
and d).
--
Hal
More information about the ubuntu-users
mailing list