[CoLoCo] bash question

Sat Aug 20 14:02:45 UTC 2011

David,

It errors out on the .py script itself. Shouldn't the *.txt tell it to skip
any non .txt file? Guess I need to point it to a dir with just the files.

Thanks.

Kevin,

As soon as I figure out if I have ruby on my laptop I'll give that a try.
Thanks for the help and lesson. Much easier to understand what it's doing
that way.

Jim

On Sat, Aug 20, 2011 at 7:19 AM, David Overcash <funnylookinhat at gmail.com>wrote:

> Sounds like one of your files is corrupted...
>
> Right after "total=0" add this:
>    print "%s" , filename
>
> That should print every file that works and then the final one that doesn't
> ( if I'm counting lines correctly... it's still a bit early...  ;)  )
>
> On Fri, Aug 19, 2011 at 11:14 PM, Jim Hutchinson <jim at ubuntu-rocks.org>wrote:
>
>> Thanks Neal,
>>
>> Gave that a try and got
>>
>> "SyntaxError: Non-ASCII character '\xc2' in file averages.py on line 14,
>> but no encoding declared;"
>>
>> I just copied your code to a file and called it "averages.py" and ran it
>> like you said:
>>
>> python averages.py *.txt > averages.out
>>
>> Seemed like it was thinking then gave the error. It created the output
>> file but it's empty.
>>
>> Thanks,
>> Jim
>>
>>
>> On Fri, Aug 19, 2011 at 10:20 PM, Neal McBurnett <neal at bcn.boulder.co.us>wrote:
>>
>>> The the way you restated the problem is very different and much
>>> easier.  Here is a program to do that.  Just put all 1000 files in one
>>> directory, which I'll assume all end in ".txt", and say
>>>
>>>  python jim-averages.py *.txt > averages.out
>>>
>>> and it will put the results in "averages.out" (which does not end in
>>> .txt....).
>>> They are tab separated for easy loading into a spreadsheet.
>>>
>>> By the way, I showed how to run the one I sent before in that message.
>>> I.e., just list all the files you want it to process on the command
>>> line.
>>>
>>> >     Run:
>>> >      python count_sum.py /tmp/qf /tmp/qy
>>>
>>> If you don't list any files, it doesn't have anything to do....
>>>
>>> Neal McBurnett                 http://neal.mcburnett.org/
>>>
>>> --- the program jim-averages.py ---
>>> #! /usr/bin/env python
>>> """
>>> "Print the total of column 3, and the average value of column 3.
>>> Skip the first line (a header)
>>> """
>>>
>>> import sys
>>>
>>> FILES = sys.argv[1:]
>>>
>>> print "file\ttotal\taverage"
>>>
>>> for filename in FILES:
>>>    total = 0
>>>
>>>    for n, line in enumerate(open(filename)):
>>>        if n == 0:
>>>            continue
>>>
>>>        total += int(line.split()[2])
>>>
>>>    print "%s\t%d\t%f" % (filename, total, total * 1.0 / n)
>>> ---
>>>
>>> On Fri, Aug 19, 2011 at 10:05:17PM -0600, Jim Hutchinson wrote:
>>> > Neal,
>>> >
>>> > I tried the script you attached. I ran it from a terminal by typing
>>> >
>>> > python count_sum.py
>>> >
>>> > It ran but gave no output and if it created a file I can't find it. I
>>> suspect I
>>> > have to have a file that it reads in first but not sure where to put
>>> that in
>>> > the script, the path to use, the location of the .py file, etc.
>>> >
>>> > Any suggestions?
>>> >
>>> > Thanks,
>>> > Jim
>>> >
>>> > On Fri, Aug 19, 2011 at 7:37 PM, Neal McBurnett <
>>> neal at bcn.boulder.co.us> wrote:
>>> >
>>> >     I've attached a program and two sample files that I think does the
>>> >     rest of the stuff you asked for, and is a bit more idiomatic.
>>> >
>>> >     One of the test files has a unicode character in it, and the other
>>> has
>>> >     a latin-1 character in it, but neither gives an error like what you
>>> >     saw.  I'm wondering if your input file has an
>>> internally-inconsistent
>>> >     encoding problem.
>>> >
>>> >     I actually included the test files (and another copy of the
>>> program)
>>> >     in a zip file so the characters get thru with their varied
>>> encodings.
>>> >
>>> >     Run:
>>> >      python count_sum.py /tmp/qf /tmp/qy
>>> >
>>> >     and it produces this:
>>> >
>>> >     Writing totals to /tmp/qf-out
>>> >     Writing totals to /tmp/qy-out
>>> >
>>> >     and for example, /tmp/qf-out contains:
>>> >
>>> >     6 blue
>>> >     5 red
>>> >
>>> >     If that's not what you wanted, say what you want.
>>> >
>>> >     Neal McBurnett                 http://neal.mcburnett.org/
>>> >
>>> >     On Fri, Aug 19, 2011 at 04:07:52PM -0600, Jim Hutchinson wrote:
>>> >     > Joey,
>>> >     >
>>> >     > Tried this in Python using files.txt in place of big2.log
>>> (shouldn't
>>> >     matter
>>> >     > what I call it, right?) and got this error
>>> >     >
>>> >     > Non-ASCII character '\xc2' in file test.sh on line 6, but no
>>> encoding
>>> >     declared;
>>> >     >
>>> >     > I copied and pasted your script as written and saved to test.sh
>>> and ran
>>> >     it. I
>>> >     > used full path in the files.txt file.
>>> >     >
>>> >     > Any ideas?
>>> >     >
>>> >     > Thanks,
>>> >     > Jim
>>> >     >
>>> >     > On Fri, Aug 19, 2011 at 3:08 PM, Joey Stanford <
>>> joey at canonical.com>
>>> >     wrote:
>>> >     >
>>> >     >     I think the easier way is going to be with awk ... but here's
>>> a
>>> >     python
>>> >     >     program that's roughly equivalent...just not looking for the
>>> 3rd
>>> >     field
>>> >     >
>>> >     >     #! /usr/bin/env python
>>> >     >
>>> >     >     data = open('big2.log')
>>> >     >     totals = {}
>>> >     >     for line in data:
>>> >     >        line = line.strip()
>>> >     >        if line:
>>> >     >            pageid = line.split()[1]
>>> >     >            pagecount = int(line.split()[0])
>>> >     >            if pageid in totals:
>>> >     >               totals[pageid] += pagecount
>>> >     >            else:
>>> >     >               totals[pageid] = pagecount
>>> >     >
>>> >     >     for key in totals:
>>> >     >            print totals[key], key
>>> >     >
>>> >     >
>>> >     >     On Fri, Aug 19, 2011 at 14:45, Jim Hutchinson <
>>> jim at ubuntu-rocks.org>
>>> >     wrote:
>>> >     >     > Wondering if any of you script gurus can help with a small
>>> problem.
>>> >     I
>>> >     >     have
>>> >     >     > several text files containing 3 columns. I was to count the
>>> number
>>> >     of
>>> >     >     > occurrences of the text in column 2 (or just count the
>>> lines) and
>>> >     sum
>>> >     >     column
>>> >     >     > 3 which is a number. I know how to do the latter with
>>> something
>>> >     like
>>> >     >     >
>>> >     >     > #!/bin/bash
>>> >     >     >
>>> >     >     > file="/home/test/file1.txt"
>>> >     >     > cat ${file} | \
>>> >     >     > while read name article count
>>> >     >     > do
>>> >     >     > sum=$(($sum + $count ))
>>> >     >     > echo "$sum"
>>> >     >     > done
>>> >     >     >
>>> >     >     > Although that prints each sum as it goes rather than just
>>> the final
>>> >     sum.
>>> >     >     > I'm not sure how to count text (basically counting the
>>> lines that
>>> >     contain
>>> >     >     > the numbers would work the same). Also, because each file
>>> has a
>>> >     header
>>> >     >     row
>>> >     >     > it's giving errors so I need to tell it to skip row 1.
>>> >     >     > Finally, I want to automate the input of each file so
>>> having it
>>> >     read the
>>> >     >     > list of text files from somewhere, process the file, output
>>> to a
>>> >     new file
>>> >     >     > amending each time, and then repeat with the next one until
>>> all
>>> >     files are
>>> >     >     > done.
>>> >     >     > Any ideas?
>>> >     >     > Thanks.
>>> >
>>> >     --
>>> >     Ubuntu-us-co mailing list
>>> >     Ubuntu-us-co at lists.ubuntu.com
>>> >     Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/
>>> >     listinfo/ubuntu-us-co
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Jim (Ubuntu geek extraordinaire)
>>> > ----
>>> > Please avoid sending me Word or PowerPoint attachments.
>>> > See http://www.gnu.org/philosophy/no-word-attachments.html
>>>
>>> > --
>>> > Ubuntu-us-co mailing list
>>> > Ubuntu-us-co at lists.ubuntu.com
>>> > Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>>>
>>>
>>> --
>>> Ubuntu-us-co mailing list
>>> Ubuntu-us-co at lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>>>
>>
>>
>>
>> --
>> Jim (Ubuntu geek extraordinaire)
>> ----
>> Please avoid sending me Word or PowerPoint attachments.
>> See http://www.gnu.org/philosophy/no-word-attachments.html
>>
>> --
>> Ubuntu-us-co mailing list
>> Ubuntu-us-co at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>>
>>
>
> --
> Ubuntu-us-co mailing list
> Ubuntu-us-co at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>
>

-- 
Jim (Ubuntu geek extraordinaire)
----
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-us-co/attachments/20110820/e3141b6b/attachment-0001.html>