scripting / data problem.

Patton Echols p.echols at
Mon Nov 9 21:22:08 UTC 2009

On 11/09/2009 12:08 AM, Justin Gruenberg wrote:
> On Sat, Nov 7, 2009 at 7:31 PM, Patton Echols <p.echols at> wrote:
>> The problem is that the various lists don't all have the same
>> information so I can't just cat them together and sort with a "unique"
>> operator.  That's a vague statement.  Here is what I mean by file and field:
> I assume you're going to need to get this data back out and into the
> original applications, eventually.
> The basic strategy I'd take is to import everything into seperate
> tables in mysql.  Create additional tables that you will export out
> of.  Massage the data from the import tables into your output tables,
> merging and correcting as you can (this may be really easy or really
> hard depending on how clean your data is).  Chances are you're going
> to need some cheap labor to clean the data up (got interns?) depending
> on the amount of data.

To "massage" would you use the same basic approach that Hal did?  Or was 
there some other way?   Note:  Where there are blank fields, then a full 
one would always "win".  Where there is conflicting data, I'd probably 
want separate records to hand massage.  but the owners of all this data 
may tell me that one of the tables is of sufficiently poor quality that 
it gets overwritten unless there is no better answer.  Not sure about 
that yet.

Interns?  That'd be great, but I'm already the volunteer! (And while the 
computer guru in comparison, obviously no great database expert. 

> I'd also suggest adding a unique ID to each record so that if you have
> to do this merge again, this will be a bit easier to handle.

Yeah, I don't want to do this again.  The goal is to clean it up so that 
it can be used and (pray) kept clean until the organization can migrate 
everything to a more "professional" solution.

Thanks  for the reply.


More information about the ubuntu-users mailing list