[ubuntu-florida] AWK

j.e.aneiros jesus.aneiros at gmail.com
Thu Sep 23 02:11:21 BST 2010


Hello everyone,

Two days ago I was confronted with a simple task: We have around 50 CSV
files that should be concatenated into one file. Every file has a header at
the first line that should be eliminated except for the first one. To solve
the problem I used  a different way that the one that I will present to you
today: streamed the files through sed deleting all the headers and putting
the result into a temp file and then add a header to that file but today I
was thinking that it could be a good opportunity to use AWK [1] for the task
and write to the list about it.

Let's see how defines a program in AWK one of its creators, Alfred Aho: "An
AWK program is of a sequence of pattern-action statements. AWK reads the
input a line at a time. A line is scanned for each pattern in the program,
and for each pattern that matches, the associated action is executed."

So we could create a program in AWK that prints the header only one time
(very easy task with the use of a flag) discarding all the others headers
and printing all the others lines. This is the solution that I found:

/^set,/ { if (!printed) { print; printed = 1 } next }
        { print }

And it was run this way:

awk -f p *.csv > all

p is the file with the AWK program.

The first line of the program matches a header, prints it and sets a flag
var then stops processing the current record and reads the next record
starting with the first pattern again. The second line of the program is
always executed printing the record (line) with the exception.

These are the times for a total of 160K lines:

real    0m0.116s
user    0m0.060s
sys     0m0.052s

Have a good night and remember, TIMTOWTDI.

[1] http://en.wikipedia.org/wiki/AWK

-- 
J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/ubuntu-us-fl/attachments/20100922/cb0e876b/attachment.htm 


More information about the Ubuntu-us-fl mailing list