AWK experts - how would I code around this in awk...

Doug Robinson dkrr at telus.net
Thu Feb 18 23:30:50 UTC 2010


Alex Janssen wrote:
> Steve Flynn wrote:
>   
>> I have a text file with lines of differing length.
>>
>> I want  to parse the entire file and make each line (for example) 20 bytes long.
>>
>> If a record is too short as in the following example: (rec1,2 and 3
>> are just so I can refer to them easily - the example file should be
>> just the numeric portion).
>>
>> rec1 123456789012345
>> rec2 67890
>> rec3 12345678901234567890
>>
>> ... then I need to append rec2 to rec1.
>>
>> Obviously after appending rec2 to rec1, the next line to be read
>> should be rec3. After completion, the entire file would consist of two
>> records in this example case, both 20 bytes long.
>>
>>
>>
>> I should point out that the complete file may well be in the hundreds
>> of millions of records so holding the entire thing in memory is
>> probably not a good idea.
>>
>> Any idea on how I would go about this in awk?
>>
>> If you believe awk to not be a good candidate for this, I'm open to
>> suggestions on alternatives.
>>
>>
>> (as a side note, this is for some data which I need to parse which has
>> embedded CF/LF's in it, thus splitting what should be one record into
>> perhaps multiples rows... I need a quick (and easy) way of stitching
>> it back together.
>>
>>
>>   
>>     
> Maybe a bash script that removes all CR's and LF's and uses echo to 
> reinsert them every 20 characters would do the job.
>
> #######script
> OLDFILE="whatever"
> NEWFILE="whatever-new"
> touch $NEWFILE
> while read -n20 LINE
> do
>   echo "$LINE" >>$NEWFILE
> done <$(cat $OLDFILE|tr -d "\n\r")
> exit 0
> ###########end script
>
> Alex
>
>   
Geee - years & year since I thought in awk; but do you mean this?
BEGIN {
i = 1
  }

/.*/ {
  if (i++ >= 3) {
    printf ("%s\n", $1)
      i =1;
  }else {
    printf ("%s ", $1)
      }
}

hundreds of thousands may talk a while just to read & write!

dkr





More information about the ubuntu-users mailing list