Scripting question -- duplicate record problem

Cameron Hutchison lists at xdna.net
Fri Dec 10 00:27:56 UTC 2010


Patton Echols <p.echols at comcast.net> writes:

>Can anyone suggest a way to have the script throw out the second record 
>based only on two fields (First name and last name)?

Something like this should work OK:

if (match($4, "@") && !seen[$1 " " $2]) {
    seen[$1 " " $2] = 1
    print
}

That is, keep an array of names you have seen and only print out the
record if you have not yet seen the name.

This has a failure mode where $1 = "Atilla", $2 = "the Hun", and the
next record has $1 = "Atilla the", $2 = "Hun". These will appear to be
the same, but it may not matter for your data. You can use another
combining character instead of a space (that does not appear in your
data) to avoid this.





More information about the ubuntu-users mailing list