AWK experts - how would I code around this in awk...

Fri Feb 19 13:13:05 UTC 2010

Steve Flynn wrote:
> On Fri, Feb 19, 2010 at 12:52 PM, Chan Chung Hang Christopher
> <christopher.chan at bradbury.edu.hk> wrote:
> 
>> For performance, it is C, awk and then perl/python...hmm...not sure
>> where php sits. At least, that was how it was back in 2002-2006 when I
>> had to parse a few dozen hourly mail log files that were each over 1GB
>> in size.
> 
> Ran the perl version (Thanks Karl!) through some test data this morning.
> 
> Whilst the data itself is no good for this test, I wanted to see the
> effects of various values of n (the required record length).
> 
> Results are as follows:
> 
> File with 1,375,031 lines in it.
> 
> N=10:     1m 18 seconds
> N = 100:     12 seconds
> N = 1000:     6 seconds
> N=32768:     5 seconds
> 
> 32768 will be the limit, as this is a system limit on the MVS
> mainframe creating the file.
> 
> I'm still waiting to get hold of a suitably large file to give it a
> proper workout but so far, it seems promising.
> 

I did say 'parse' and not just strip newlines and carriage-returns and 
watch a counter. :-D

Which is why it is good to test the algorithm out in perl/python and 
where needed, take it to awk or C.