scripting / data problem.
Patton Echols
p.echols at comcast.net
Mon Nov 9 21:22:08 UTC 2009
On 11/09/2009 12:08 AM, Justin Gruenberg wrote:
> On Sat, Nov 7, 2009 at 7:31 PM, Patton Echols <p.echols at comcast.net> wrote:
>
>> The problem is that the various lists don't all have the same
>> information so I can't just cat them together and sort with a "unique"
>> operator. That's a vague statement. Here is what I mean by file and field:
>>
>>
>
> I assume you're going to need to get this data back out and into the
> original applications, eventually.
>
> The basic strategy I'd take is to import everything into seperate
> tables in mysql. Create additional tables that you will export out
> of. Massage the data from the import tables into your output tables,
> merging and correcting as you can (this may be really easy or really
> hard depending on how clean your data is). Chances are you're going
> to need some cheap labor to clean the data up (got interns?) depending
> on the amount of data.
>
To "massage" would you use the same basic approach that Hal did? Or was
there some other way? Note: Where there are blank fields, then a full
one would always "win". Where there is conflicting data, I'd probably
want separate records to hand massage. but the owners of all this data
may tell me that one of the tables is of sufficiently poor quality that
it gets overwritten unless there is no better answer. Not sure about
that yet.
Interns? That'd be great, but I'm already the volunteer! (And while the
computer guru in comparison, obviously no great database expert.
</understatement>
> I'd also suggest adding a unique ID to each record so that if you have
> to do this merge again, this will be a bit easier to handle.
>
>
Yeah, I don't want to do this again. The goal is to clean it up so that
it can be used and (pray) kept clean until the organization can migrate
everything to a more "professional" solution.
Thanks for the reply.
--PE
More information about the ubuntu-users
mailing list