[CoLoCo] bash question
Neal McBurnett
neal at bcn.boulder.co.us
Sat Aug 20 01:37:48 UTC 2011
I've attached a program and two sample files that I think does the
rest of the stuff you asked for, and is a bit more idiomatic.
One of the test files has a unicode character in it, and the other has
a latin-1 character in it, but neither gives an error like what you
saw. I'm wondering if your input file has an internally-inconsistent
encoding problem.
I actually included the test files (and another copy of the program)
in a zip file so the characters get thru with their varied encodings.
Run:
python count_sum.py /tmp/qf /tmp/qy
and it produces this:
Writing totals to /tmp/qf-out
Writing totals to /tmp/qy-out
and for example, /tmp/qf-out contains:
6 blue
5 red
If that's not what you wanted, say what you want.
Neal McBurnett http://neal.mcburnett.org/
On Fri, Aug 19, 2011 at 04:07:52PM -0600, Jim Hutchinson wrote:
> Joey,
>
> Tried this in Python using files.txt in place of big2.log (shouldn't matter
> what I call it, right?) and got this error
>
> Non-ASCII character '\xc2' in file test.sh on line 6, but no encoding declared;
>
> I copied and pasted your script as written and saved to test.sh and ran it. I
> used full path in the files.txt file.
>
> Any ideas?
>
> Thanks,
> Jim
>
> On Fri, Aug 19, 2011 at 3:08 PM, Joey Stanford <joey at canonical.com> wrote:
>
> I think the easier way is going to be with awk ... but here's a python
> program that's roughly equivalent...just not looking for the 3rd field
>
> #! /usr/bin/env python
>
> data = open('big2.log')
> totals = {}
> for line in data:
> line = line.strip()
> if line:
> pageid = line.split()[1]
> pagecount = int(line.split()[0])
> if pageid in totals:
> totals[pageid] += pagecount
> else:
> totals[pageid] = pagecount
>
> for key in totals:
> print totals[key], key
>
>
> On Fri, Aug 19, 2011 at 14:45, Jim Hutchinson <jim at ubuntu-rocks.org> wrote:
> > Wondering if any of you script gurus can help with a small problem. I
> have
> > several text files containing 3 columns. I was to count the number of
> > occurrences of the text in column 2 (or just count the lines) and sum
> column
> > 3 which is a number. I know how to do the latter with something like
> >
> > #!/bin/bash
> >
> > file="/home/test/file1.txt"
> > cat ${file} | \
> > while read name article count
> > do
> > sum=$(($sum + $count ))
> > echo "$sum"
> > done
> >
> > Although that prints each sum as it goes rather than just the final sum.
> > I'm not sure how to count text (basically counting the lines that contain
> > the numbers would work the same). Also, because each file has a header
> row
> > it's giving errors so I need to tell it to skip row 1.
> > Finally, I want to automate the input of each file so having it read the
> > list of text files from somewhere, process the file, output to a new file
> > amending each time, and then repeat with the next one until all files are
> > done.
> > Any ideas?
> > Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: count_sum.py
Type: text/x-python
Size: 588 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-us-co/attachments/20110819/f5fc88db/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: count_sum.zip
Type: application/zip
Size: 824 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-us-co/attachments/20110819/f5fc88db/attachment.zip>
More information about the Ubuntu-us-co
mailing list