AWK experts - how would I code around this in awk...

Chan Chung Hang Christopher christopher.chan at bradbury.edu.hk
Fri Feb 19 12:52:00 UTC 2010


Steve Flynn wrote:
> On Fri, Feb 19, 2010 at 1:08 AM, Christopher Chan
> <christopher.chan at bradbury.edu.hk> wrote:
>> On Friday, February 19, 2010 02:02 AM, Steve Flynn wrote:
> 
>>> rec1 123456789012345
>>> rec2 67890
>>> rec3 12345678901234567890
>>>
>>> ... then I need to append rec2 to rec1.
>> You just have to make each line 20 bytes long right. There is no need to
>> preserve lines that are already 20 bytes long like in the case that rec1
>> + rec2 will not make up 20 bytes and so we chop up rec3 and so on and so
>> forth right?
> 
> Yup.
> 
> A little background info.
> 
> The data itself is ripped out of an IBM mainframe, converted from
> EBCDIC to ASCII and transferred to me via Connect:Direct.
> When it arrived on our AIX platform, the embedded CR/LF's in the data
> cause this scenario where we have split records.
> 
> I need to come up with a way to join the split lines back together
> without concatenating every line together.
> 
> The best that the data extract guys could say was that they would make
> the file a pre-set fixed width... this then permits me to start
> joining lines together with the guarantee that at some point I'll find
> that the line I'm currently building is exactly 'n' bytes long, thus
> indicating that it's complete. The 20 byte example is just that - an
> example. The real value of n will probably be 32K, or whatever the
> maximum limit for LRECL is on an MVS mainframe (I forget now but I
> think it's 32768 bytes). Of course, this ALL surmises that 32K is
> enough to hold the longest record in the source system.
> 
> My first though was awk.. but before I started coding anything I
> thought I'd throw it out to you guys to see if anyone had a better
> solution. Perl is also installed on my development box, much to my
> surprise, so I'll give Karls perl a crack of the whip first and then
> try to code something up in C, awk, and anything else I can think off
> just for laughs.
> 
> infa_pc at clp-wmvx-mga01 infa_pc $ perl -v
> 

For performance, it is C, awk and then perl/python...hmm...not sure 
where php sits. At least, that was how it was back in 2002-2006 when I 
had to parse a few dozen hourly mail log files that were each over 1GB 
in size.




More information about the ubuntu-users mailing list