scripting / data problem.
Patton Echols
p.echols at comcast.net
Sun Nov 8 01:31:25 UTC 2009
I have a data file problem that I hope I can get an assist on. Not an
"Ubuntu" question really, but this is about the smartest group I know to
ask!
Note: I'm not asking anyone to do my homework. In fact, I'd be happy
with the RTFM answer, as long as you could tell me which FM and why you
say so. Unfortunately, it takes a bit to explain the problem, so this
is kind of long.
Here is the issue:
I have a number of different, partially overlapping in the form of (2)
MS Outlook contact databases, (1) MS Access database, (1) quickbooks and
(1) SQL db.
Though we don't have access the SQL available yet, we currently think
that CSV files are the common denominator that everything can import and
export.
The problem is that the various lists don't all have the same
information so I can't just cat them together and sort with a "unique"
operator. That's a vague statement. Here is what I mean by file and field:
file 1 - | email |
file 2 - | email | f-name | l-name |
file 3 - | f-name | l-name | company | phone |
file 4 - | company | address | city | state |
(By the way, each one has some duplicates with less than complete info.)
What we need is to be able to populate all of these files with all the
available data for each contact. The concept I have in mind is to
compare the files in pairs. So file 1 would go into file-L (for Last)
Then file 1 would be compared with each line of file-L and if there is a
record with matching email, then any extra info is added , but not
overwriting anything that is already in any field So the final file
would look like this:
| file-1 | file-2 | file-3 | file-4 | email | f-name | l-name | company
| address | city | state | phone |
The first four fields would have an "x" or something that could be
subsequently used to parse them back to the originals.
I have thought about scripting something that would, "for each line in
file 1, use the fields in the line for variables, feed them to "awk"
which tests file 2 and returns results to the script to be added to file-L"
I have also thought that awk could do the whole thing, but I'd hate to
plow through the entire user guide if someone told me it was hopeless.
I also wondered whether I could create a mysql database with this info,
and there are commands that can merge records (rather than just
eliminate what looks like a duplicate).
Or perhaps there is another strategy???
Obviously I have a lot of learning to do, but I am wondering about what?
Thanks very much for all of you who have read this far. I'd appreciate
any insight you may have into the best approach.
--PE
More information about the ubuntu-users
mailing list