[CoLoCo] bash question
Neal McBurnett
neal at bcn.boulder.co.us
Sat Aug 20 04:20:38 UTC 2011
The the way you restated the problem is very different and much
easier. Here is a program to do that. Just put all 1000 files in one
directory, which I'll assume all end in ".txt", and say
python jim-averages.py *.txt > averages.out
and it will put the results in "averages.out" (which does not end in .txt....).
They are tab separated for easy loading into a spreadsheet.
By the way, I showed how to run the one I sent before in that message.
I.e., just list all the files you want it to process on the command
line.
> Run:
> python count_sum.py /tmp/qf /tmp/qy
If you don't list any files, it doesn't have anything to do....
Neal McBurnett http://neal.mcburnett.org/
--- the program jim-averages.py ---
#! /usr/bin/env python
"""
"Print the total of column 3, and the average value of column 3.
Skip the first line (a header)
"""
import sys
FILES = sys.argv[1:]
print "file\ttotal\taverage"
for filename in FILES:
total = 0
for n, line in enumerate(open(filename)):
if n == 0:
continue
total += int(line.split()[2])
print "%s\t%d\t%f" % (filename, total, total * 1.0 / n)
---
On Fri, Aug 19, 2011 at 10:05:17PM -0600, Jim Hutchinson wrote:
> Neal,
>
> I tried the script you attached. I ran it from a terminal by typing
>
> python count_sum.py
>
> It ran but gave no output and if it created a file I can't find it. I suspect I
> have to have a file that it reads in first but not sure where to put that in
> the script, the path to use, the location of the .py file, etc.
>
> Any suggestions?
>
> Thanks,
> Jim
>
> On Fri, Aug 19, 2011 at 7:37 PM, Neal McBurnett <neal at bcn.boulder.co.us> wrote:
>
> I've attached a program and two sample files that I think does the
> rest of the stuff you asked for, and is a bit more idiomatic.
>
> One of the test files has a unicode character in it, and the other has
> a latin-1 character in it, but neither gives an error like what you
> saw. I'm wondering if your input file has an internally-inconsistent
> encoding problem.
>
> I actually included the test files (and another copy of the program)
> in a zip file so the characters get thru with their varied encodings.
>
> Run:
> python count_sum.py /tmp/qf /tmp/qy
>
> and it produces this:
>
> Writing totals to /tmp/qf-out
> Writing totals to /tmp/qy-out
>
> and for example, /tmp/qf-out contains:
>
> 6 blue
> 5 red
>
> If that's not what you wanted, say what you want.
>
> Neal McBurnett http://neal.mcburnett.org/
>
> On Fri, Aug 19, 2011 at 04:07:52PM -0600, Jim Hutchinson wrote:
> > Joey,
> >
> > Tried this in Python using files.txt in place of big2.log (shouldn't
> matter
> > what I call it, right?) and got this error
> >
> > Non-ASCII character '\xc2' in file test.sh on line 6, but no encoding
> declared;
> >
> > I copied and pasted your script as written and saved to test.sh and ran
> it. I
> > used full path in the files.txt file.
> >
> > Any ideas?
> >
> > Thanks,
> > Jim
> >
> > On Fri, Aug 19, 2011 at 3:08 PM, Joey Stanford <joey at canonical.com>
> wrote:
> >
> > I think the easier way is going to be with awk ... but here's a
> python
> > program that's roughly equivalent...just not looking for the 3rd
> field
> >
> > #! /usr/bin/env python
> >
> > data = open('big2.log')
> > totals = {}
> > for line in data:
> > line = line.strip()
> > if line:
> > pageid = line.split()[1]
> > pagecount = int(line.split()[0])
> > if pageid in totals:
> > totals[pageid] += pagecount
> > else:
> > totals[pageid] = pagecount
> >
> > for key in totals:
> > print totals[key], key
> >
> >
> > On Fri, Aug 19, 2011 at 14:45, Jim Hutchinson <jim at ubuntu-rocks.org>
> wrote:
> > > Wondering if any of you script gurus can help with a small problem.
> I
> > have
> > > several text files containing 3 columns. I was to count the number
> of
> > > occurrences of the text in column 2 (or just count the lines) and
> sum
> > column
> > > 3 which is a number. I know how to do the latter with something
> like
> > >
> > > #!/bin/bash
> > >
> > > file="/home/test/file1.txt"
> > > cat ${file} | \
> > > while read name article count
> > > do
> > > sum=$(($sum + $count ))
> > > echo "$sum"
> > > done
> > >
> > > Although that prints each sum as it goes rather than just the final
> sum.
> > > I'm not sure how to count text (basically counting the lines that
> contain
> > > the numbers would work the same). Also, because each file has a
> header
> > row
> > > it's giving errors so I need to tell it to skip row 1.
> > > Finally, I want to automate the input of each file so having it
> read the
> > > list of text files from somewhere, process the file, output to a
> new file
> > > amending each time, and then repeat with the next one until all
> files are
> > > done.
> > > Any ideas?
> > > Thanks.
>
> --
> Ubuntu-us-co mailing list
> Ubuntu-us-co at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/
> listinfo/ubuntu-us-co
>
>
>
>
>
> --
> Jim (Ubuntu geek extraordinaire)
> ----
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> --
> Ubuntu-us-co mailing list
> Ubuntu-us-co at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
More information about the Ubuntu-us-co
mailing list