AWK experts - how would I code around this in awk...

Doug Robinson dkrr at telus.net
Thu Feb 18 23:47:23 GMT 2010


Doug Robinson wrote:
> Alex Janssen wrote:
>   
>> Steve Flynn wrote:
>>   
>>     
>>> I have a text file with lines of differing length.
>>>
>>> I want  to parse the entire file and make each line (for example) 20 bytes long.
>>>
>>> If a record is too short as in the following example: (rec1,2 and 3
>>> are just so I can refer to them easily - the example file should be
>>> just the numeric portion).
>>>
>>> rec1 123456789012345
>>> rec2 67890
>>> rec3 12345678901234567890
>>>
>>> ... then I need to append rec2 to rec1.
>>>
>>> Obviously after appending rec2 to rec1, the next line to be read
>>> should be rec3. After completion, the entire file would consist of two
>>> records in this example case, both 20 bytes long.
>>>
>>>
>>>
>>> I should point out that the complete file may well be in the hundreds
>>> of millions of records so holding the entire thing in memory is
>>> probably not a good idea.
>>>
>>> Any idea on how I would go about this in awk?
>>>
>>> If you believe awk to not be a good candidate for this, I'm open to
>>> suggestions on alternatives.
>>>
>>>
>>> (as a side note, this is for some data which I need to parse which has
>>> embedded CF/LF's in it, thus splitting what should be one record into
>>> perhaps multiples rows... I need a quick (and easy) way of stitching
>>> it back together.
>>>
>>>
>>>   
>>>     
>>>       
>> Maybe a bash script that removes all CR's and LF's and uses echo to 
>> reinsert them every 20 characters would do the job.
>>
>> #######script
>> OLDFILE="whatever"
>> NEWFILE="whatever-new"
>> touch $NEWFILE
>> while read -n20 LINE
>> do
>>   echo "$LINE" >>$NEWFILE
>> done <$(cat $OLDFILE|tr -d "\n\r")
>> exit 0
>> ###########end script
>>
>> Alex
>>
>>   
>>     
> Geee - years & year since I thought in awk; but do you mean this?
> BEGIN {
> i = 1
>   }
>
> /.*/ {
>   if (i++ >= 3) {
>     printf ("%s\n", $1)
>       i =1;
>   }else {
>     printf ("%s ", $1)
>       }
> }
>
> hundreds of thousands may talk a while just to read & write!
>
> dkr
>
>
>   

or perhaps this:

BEGIN {
len = 0
  }
END {
print ""
}

/.*/ {
  if (len < 20) {
    printf ("%s", $1)
      len = len + length ($1)
   } else {
    printf ("%s\n", $1)
    len = 0
  }
}


anyway this is the idea - good luck with AWK - fun fun fun

dkr





More information about the ubuntu-users mailing list