[CoLoCo] bash question

Neal McBurnett neal at bcn.boulder.co.us
Sat Aug 20 01:37:48 UTC 2011


I've attached a program and two sample files that I think does the
rest of the stuff you asked for, and is a bit more idiomatic.

One of the test files has a unicode character in it, and the other has
a latin-1 character in it, but neither gives an error like what you
saw.  I'm wondering if your input file has an internally-inconsistent
encoding problem.

I actually included the test files (and another copy of the program)
in a zip file so the characters get thru with their varied encodings.

Run:
 python count_sum.py /tmp/qf /tmp/qy 

and it produces this:

Writing totals to /tmp/qf-out
Writing totals to /tmp/qy-out

and for example, /tmp/qf-out contains:

6 blue
5 red

If that's not what you wanted, say what you want.

Neal McBurnett                 http://neal.mcburnett.org/

On Fri, Aug 19, 2011 at 04:07:52PM -0600, Jim Hutchinson wrote:
> Joey,
> 
> Tried this in Python using files.txt in place of big2.log (shouldn't matter
> what I call it, right?) and got this error
> 
> Non-ASCII character '\xc2' in file test.sh on line 6, but no encoding declared;
> 
> I copied and pasted your script as written and saved to test.sh and ran it. I
> used full path in the files.txt file.
> 
> Any ideas?
> 
> Thanks,
> Jim
> 
> On Fri, Aug 19, 2011 at 3:08 PM, Joey Stanford <joey at canonical.com> wrote:
> 
>     I think the easier way is going to be with awk ... but here's a python
>     program that's roughly equivalent...just not looking for the 3rd field
> 
>     #! /usr/bin/env python
> 
>     data = open('big2.log')
>     totals = {}
>     for line in data:
>        line = line.strip()
>        if line:
>            pageid = line.split()[1]
>            pagecount = int(line.split()[0])
>            if pageid in totals:
>               totals[pageid] += pagecount
>            else:
>               totals[pageid] = pagecount
> 
>     for key in totals:
>            print totals[key], key
> 
> 
>     On Fri, Aug 19, 2011 at 14:45, Jim Hutchinson <jim at ubuntu-rocks.org> wrote:
>     > Wondering if any of you script gurus can help with a small problem. I
>     have
>     > several text files containing 3 columns. I was to count the number of
>     > occurrences of the text in column 2 (or just count the lines) and sum
>     column
>     > 3 which is a number. I know how to do the latter with something like
>     >
>     > #!/bin/bash
>     >
>     > file="/home/test/file1.txt"
>     > cat ${file} | \
>     > while read name article count
>     > do
>     > sum=$(($sum + $count ))
>     > echo "$sum"
>     > done
>     >
>     > Although that prints each sum as it goes rather than just the final sum.
>     > I'm not sure how to count text (basically counting the lines that contain
>     > the numbers would work the same). Also, because each file has a header
>     row
>     > it's giving errors so I need to tell it to skip row 1.
>     > Finally, I want to automate the input of each file so having it read the
>     > list of text files from somewhere, process the file, output to a new file
>     > amending each time, and then repeat with the next one until all files are
>     > done.
>     > Any ideas?
>     > Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: count_sum.py
Type: text/x-python
Size: 588 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-us-co/attachments/20110819/f5fc88db/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: count_sum.zip
Type: application/zip
Size: 824 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-us-co/attachments/20110819/f5fc88db/attachment.zip>


More information about the Ubuntu-us-co mailing list