Scripting Question
Patton Echols
p.echols at comcast.net
Sat Feb 21 07:18:30 UTC 2009
On 02/20/2009 03:30 AM, Hal Burgiss wrote:
> On Fri, Feb 20, 2009 at 12:11:30AM -0800, Patton Echols wrote:
>
>> As an aside, I manually cleaned out the few duplicate lines in the
>> result. I am going to read 'man gawk' to see if I could figure out how
>> to clean duplicates automatically.
>>
>
>
> If the entire line is duplicated ...
>
> sort $file |uniq > $newfile
>
> That will likely screw the header line so I would strip that first,
> and then re-insert it.
>
>
>
Sure, good reminder. When I have this problem what I am usually doing
is working with combining multiple lists from different sources. So the
gawk command has the benefit of normalizing the results. But uniq,
only works sometimes because of the way people get entered in the first
place Bob Smith, Robert Smith, Rob Smith, you get the idea.
I'm kind of thinking that gawk could compare one field with the rest of
the file and delete records with a match then move to the next record.
Something like -- for records 1 to end, match field $N, if match delete
record, else next record . . . or something. Another post has a qawk
manual that has more explanation than the man page. I may be able to
figure that out . . .
More information about the ubuntu-users
mailing list