Scripting Question

Sat Feb 21 07:18:30 UTC 2009

On 02/20/2009 03:30 AM, Hal Burgiss wrote:
> On Fri, Feb 20, 2009 at 12:11:30AM -0800, Patton Echols wrote:
>   
>> As an aside, I manually cleaned out the few duplicate lines in the 
>> result.   I am going to read 'man gawk' to see if I could figure out how 
>> to clean duplicates automatically.
>>     
>
>
> If the entire line is duplicated ...
>
>  sort $file |uniq > $newfile
>
> That will likely screw the header line so I would strip that first,
> and then re-insert it.
>
>
>   
Sure, good reminder.  When I have this problem what I am usually doing 
is working with combining multiple lists from different sources.  So the 
gawk command has the benefit of  normalizing the results.  But  uniq, 
only works sometimes because of the way people get entered in the first 
place Bob Smith, Robert Smith, Rob Smith, you get the idea. 

I'm kind of thinking that gawk could compare one field with the rest of 
the file and delete records with a match then move to the next record.  
Something like -- for records 1 to end, match field $N, if match delete 
record, else next record  . . . or something.    Another post has a qawk 
manual that has more explanation than the man page.  I  may be able to 
figure that out . . .