[CoLoCo] bash question

Neal McBurnett neal at bcn.boulder.co.us
Sat Aug 20 04:20:38 UTC 2011


The the way you restated the problem is very different and much
easier.  Here is a program to do that.  Just put all 1000 files in one
directory, which I'll assume all end in ".txt", and say

 python jim-averages.py *.txt > averages.out

and it will put the results in "averages.out" (which does not end in .txt....).
They are tab separated for easy loading into a spreadsheet.

By the way, I showed how to run the one I sent before in that message.
I.e., just list all the files you want it to process on the command
line.

>     Run:
>      python count_sum.py /tmp/qf /tmp/qy

If you don't list any files, it doesn't have anything to do....

Neal McBurnett                 http://neal.mcburnett.org/

--- the program jim-averages.py ---
#! /usr/bin/env python 
""" 
"Print the total of column 3, and the average value of column 3. 
Skip the first line (a header) 
""" 
 
import sys 
 
FILES = sys.argv[1:] 
 
print "file\ttotal\taverage"

for filename in FILES: 
    total = 0 
 
    for n, line in enumerate(open(filename)): 
        if n == 0: 
            continue 
 
        total += int(line.split()[2]) 
 
    print "%s\t%d\t%f" % (filename, total, total * 1.0 / n) 
---

On Fri, Aug 19, 2011 at 10:05:17PM -0600, Jim Hutchinson wrote:
> Neal,
> 
> I tried the script you attached. I ran it from a terminal by typing
> 
> python count_sum.py
> 
> It ran but gave no output and if it created a file I can't find it. I suspect I
> have to have a file that it reads in first but not sure where to put that in
> the script, the path to use, the location of the .py file, etc.
> 
> Any suggestions?
> 
> Thanks,
> Jim
> 
> On Fri, Aug 19, 2011 at 7:37 PM, Neal McBurnett <neal at bcn.boulder.co.us> wrote:
> 
>     I've attached a program and two sample files that I think does the
>     rest of the stuff you asked for, and is a bit more idiomatic.
> 
>     One of the test files has a unicode character in it, and the other has
>     a latin-1 character in it, but neither gives an error like what you
>     saw.  I'm wondering if your input file has an internally-inconsistent
>     encoding problem.
> 
>     I actually included the test files (and another copy of the program)
>     in a zip file so the characters get thru with their varied encodings.
> 
>     Run:
>      python count_sum.py /tmp/qf /tmp/qy
> 
>     and it produces this:
> 
>     Writing totals to /tmp/qf-out
>     Writing totals to /tmp/qy-out
> 
>     and for example, /tmp/qf-out contains:
> 
>     6 blue
>     5 red
> 
>     If that's not what you wanted, say what you want.
> 
>     Neal McBurnett                 http://neal.mcburnett.org/
> 
>     On Fri, Aug 19, 2011 at 04:07:52PM -0600, Jim Hutchinson wrote:
>     > Joey,
>     >
>     > Tried this in Python using files.txt in place of big2.log (shouldn't
>     matter
>     > what I call it, right?) and got this error
>     >
>     > Non-ASCII character '\xc2' in file test.sh on line 6, but no encoding
>     declared;
>     >
>     > I copied and pasted your script as written and saved to test.sh and ran
>     it. I
>     > used full path in the files.txt file.
>     >
>     > Any ideas?
>     >
>     > Thanks,
>     > Jim
>     >
>     > On Fri, Aug 19, 2011 at 3:08 PM, Joey Stanford <joey at canonical.com>
>     wrote:
>     >
>     >     I think the easier way is going to be with awk ... but here's a
>     python
>     >     program that's roughly equivalent...just not looking for the 3rd
>     field
>     >
>     >     #! /usr/bin/env python
>     >
>     >     data = open('big2.log')
>     >     totals = {}
>     >     for line in data:
>     >        line = line.strip()
>     >        if line:
>     >            pageid = line.split()[1]
>     >            pagecount = int(line.split()[0])
>     >            if pageid in totals:
>     >               totals[pageid] += pagecount
>     >            else:
>     >               totals[pageid] = pagecount
>     >
>     >     for key in totals:
>     >            print totals[key], key
>     >
>     >
>     >     On Fri, Aug 19, 2011 at 14:45, Jim Hutchinson <jim at ubuntu-rocks.org>
>     wrote:
>     >     > Wondering if any of you script gurus can help with a small problem.
>     I
>     >     have
>     >     > several text files containing 3 columns. I was to count the number
>     of
>     >     > occurrences of the text in column 2 (or just count the lines) and
>     sum
>     >     column
>     >     > 3 which is a number. I know how to do the latter with something
>     like
>     >     >
>     >     > #!/bin/bash
>     >     >
>     >     > file="/home/test/file1.txt"
>     >     > cat ${file} | \
>     >     > while read name article count
>     >     > do
>     >     > sum=$(($sum + $count ))
>     >     > echo "$sum"
>     >     > done
>     >     >
>     >     > Although that prints each sum as it goes rather than just the final
>     sum.
>     >     > I'm not sure how to count text (basically counting the lines that
>     contain
>     >     > the numbers would work the same). Also, because each file has a
>     header
>     >     row
>     >     > it's giving errors so I need to tell it to skip row 1.
>     >     > Finally, I want to automate the input of each file so having it
>     read the
>     >     > list of text files from somewhere, process the file, output to a
>     new file
>     >     > amending each time, and then repeat with the next one until all
>     files are
>     >     > done.
>     >     > Any ideas?
>     >     > Thanks.
> 
>     --
>     Ubuntu-us-co mailing list
>     Ubuntu-us-co at lists.ubuntu.com
>     Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/
>     listinfo/ubuntu-us-co
> 
> 
> 
> 
> 
> --
> Jim (Ubuntu geek extraordinaire)
> ----
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html

> -- 
> Ubuntu-us-co mailing list
> Ubuntu-us-co at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co




More information about the Ubuntu-us-co mailing list