[CoLoCo] bash question
Jim Hutchinson
jim at ubuntu-rocks.org
Sat Aug 20 14:02:45 UTC 2011
David,
It errors out on the .py script itself. Shouldn't the *.txt tell it to skip
any non .txt file? Guess I need to point it to a dir with just the files.
Thanks.
Kevin,
As soon as I figure out if I have ruby on my laptop I'll give that a try.
Thanks for the help and lesson. Much easier to understand what it's doing
that way.
Jim
On Sat, Aug 20, 2011 at 7:19 AM, David Overcash <funnylookinhat at gmail.com>wrote:
> Sounds like one of your files is corrupted...
>
> Right after "total=0" add this:
> print "%s" , filename
>
> That should print every file that works and then the final one that doesn't
> ( if I'm counting lines correctly... it's still a bit early... ;) )
>
> On Fri, Aug 19, 2011 at 11:14 PM, Jim Hutchinson <jim at ubuntu-rocks.org>wrote:
>
>> Thanks Neal,
>>
>> Gave that a try and got
>>
>> "SyntaxError: Non-ASCII character '\xc2' in file averages.py on line 14,
>> but no encoding declared;"
>>
>> I just copied your code to a file and called it "averages.py" and ran it
>> like you said:
>>
>> python averages.py *.txt > averages.out
>>
>> Seemed like it was thinking then gave the error. It created the output
>> file but it's empty.
>>
>> Thanks,
>> Jim
>>
>>
>> On Fri, Aug 19, 2011 at 10:20 PM, Neal McBurnett <neal at bcn.boulder.co.us>wrote:
>>
>>> The the way you restated the problem is very different and much
>>> easier. Here is a program to do that. Just put all 1000 files in one
>>> directory, which I'll assume all end in ".txt", and say
>>>
>>> python jim-averages.py *.txt > averages.out
>>>
>>> and it will put the results in "averages.out" (which does not end in
>>> .txt....).
>>> They are tab separated for easy loading into a spreadsheet.
>>>
>>> By the way, I showed how to run the one I sent before in that message.
>>> I.e., just list all the files you want it to process on the command
>>> line.
>>>
>>> > Run:
>>> > python count_sum.py /tmp/qf /tmp/qy
>>>
>>> If you don't list any files, it doesn't have anything to do....
>>>
>>> Neal McBurnett http://neal.mcburnett.org/
>>>
>>> --- the program jim-averages.py ---
>>> #! /usr/bin/env python
>>> """
>>> "Print the total of column 3, and the average value of column 3.
>>> Skip the first line (a header)
>>> """
>>>
>>> import sys
>>>
>>> FILES = sys.argv[1:]
>>>
>>> print "file\ttotal\taverage"
>>>
>>> for filename in FILES:
>>> total = 0
>>>
>>> for n, line in enumerate(open(filename)):
>>> if n == 0:
>>> continue
>>>
>>> total += int(line.split()[2])
>>>
>>> print "%s\t%d\t%f" % (filename, total, total * 1.0 / n)
>>> ---
>>>
>>> On Fri, Aug 19, 2011 at 10:05:17PM -0600, Jim Hutchinson wrote:
>>> > Neal,
>>> >
>>> > I tried the script you attached. I ran it from a terminal by typing
>>> >
>>> > python count_sum.py
>>> >
>>> > It ran but gave no output and if it created a file I can't find it. I
>>> suspect I
>>> > have to have a file that it reads in first but not sure where to put
>>> that in
>>> > the script, the path to use, the location of the .py file, etc.
>>> >
>>> > Any suggestions?
>>> >
>>> > Thanks,
>>> > Jim
>>> >
>>> > On Fri, Aug 19, 2011 at 7:37 PM, Neal McBurnett <
>>> neal at bcn.boulder.co.us> wrote:
>>> >
>>> > I've attached a program and two sample files that I think does the
>>> > rest of the stuff you asked for, and is a bit more idiomatic.
>>> >
>>> > One of the test files has a unicode character in it, and the other
>>> has
>>> > a latin-1 character in it, but neither gives an error like what you
>>> > saw. I'm wondering if your input file has an
>>> internally-inconsistent
>>> > encoding problem.
>>> >
>>> > I actually included the test files (and another copy of the
>>> program)
>>> > in a zip file so the characters get thru with their varied
>>> encodings.
>>> >
>>> > Run:
>>> > python count_sum.py /tmp/qf /tmp/qy
>>> >
>>> > and it produces this:
>>> >
>>> > Writing totals to /tmp/qf-out
>>> > Writing totals to /tmp/qy-out
>>> >
>>> > and for example, /tmp/qf-out contains:
>>> >
>>> > 6 blue
>>> > 5 red
>>> >
>>> > If that's not what you wanted, say what you want.
>>> >
>>> > Neal McBurnett http://neal.mcburnett.org/
>>> >
>>> > On Fri, Aug 19, 2011 at 04:07:52PM -0600, Jim Hutchinson wrote:
>>> > > Joey,
>>> > >
>>> > > Tried this in Python using files.txt in place of big2.log
>>> (shouldn't
>>> > matter
>>> > > what I call it, right?) and got this error
>>> > >
>>> > > Non-ASCII character '\xc2' in file test.sh on line 6, but no
>>> encoding
>>> > declared;
>>> > >
>>> > > I copied and pasted your script as written and saved to test.sh
>>> and ran
>>> > it. I
>>> > > used full path in the files.txt file.
>>> > >
>>> > > Any ideas?
>>> > >
>>> > > Thanks,
>>> > > Jim
>>> > >
>>> > > On Fri, Aug 19, 2011 at 3:08 PM, Joey Stanford <
>>> joey at canonical.com>
>>> > wrote:
>>> > >
>>> > > I think the easier way is going to be with awk ... but here's
>>> a
>>> > python
>>> > > program that's roughly equivalent...just not looking for the
>>> 3rd
>>> > field
>>> > >
>>> > > #! /usr/bin/env python
>>> > >
>>> > > data = open('big2.log')
>>> > > totals = {}
>>> > > for line in data:
>>> > > line = line.strip()
>>> > > if line:
>>> > > pageid = line.split()[1]
>>> > > pagecount = int(line.split()[0])
>>> > > if pageid in totals:
>>> > > totals[pageid] += pagecount
>>> > > else:
>>> > > totals[pageid] = pagecount
>>> > >
>>> > > for key in totals:
>>> > > print totals[key], key
>>> > >
>>> > >
>>> > > On Fri, Aug 19, 2011 at 14:45, Jim Hutchinson <
>>> jim at ubuntu-rocks.org>
>>> > wrote:
>>> > > > Wondering if any of you script gurus can help with a small
>>> problem.
>>> > I
>>> > > have
>>> > > > several text files containing 3 columns. I was to count the
>>> number
>>> > of
>>> > > > occurrences of the text in column 2 (or just count the
>>> lines) and
>>> > sum
>>> > > column
>>> > > > 3 which is a number. I know how to do the latter with
>>> something
>>> > like
>>> > > >
>>> > > > #!/bin/bash
>>> > > >
>>> > > > file="/home/test/file1.txt"
>>> > > > cat ${file} | \
>>> > > > while read name article count
>>> > > > do
>>> > > > sum=$(($sum + $count ))
>>> > > > echo "$sum"
>>> > > > done
>>> > > >
>>> > > > Although that prints each sum as it goes rather than just
>>> the final
>>> > sum.
>>> > > > I'm not sure how to count text (basically counting the
>>> lines that
>>> > contain
>>> > > > the numbers would work the same). Also, because each file
>>> has a
>>> > header
>>> > > row
>>> > > > it's giving errors so I need to tell it to skip row 1.
>>> > > > Finally, I want to automate the input of each file so
>>> having it
>>> > read the
>>> > > > list of text files from somewhere, process the file, output
>>> to a
>>> > new file
>>> > > > amending each time, and then repeat with the next one until
>>> all
>>> > files are
>>> > > > done.
>>> > > > Any ideas?
>>> > > > Thanks.
>>> >
>>> > --
>>> > Ubuntu-us-co mailing list
>>> > Ubuntu-us-co at lists.ubuntu.com
>>> > Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/
>>> > listinfo/ubuntu-us-co
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Jim (Ubuntu geek extraordinaire)
>>> > ----
>>> > Please avoid sending me Word or PowerPoint attachments.
>>> > See http://www.gnu.org/philosophy/no-word-attachments.html
>>>
>>> > --
>>> > Ubuntu-us-co mailing list
>>> > Ubuntu-us-co at lists.ubuntu.com
>>> > Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>>>
>>>
>>> --
>>> Ubuntu-us-co mailing list
>>> Ubuntu-us-co at lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>>>
>>
>>
>>
>> --
>> Jim (Ubuntu geek extraordinaire)
>> ----
>> Please avoid sending me Word or PowerPoint attachments.
>> See http://www.gnu.org/philosophy/no-word-attachments.html
>>
>> --
>> Ubuntu-us-co mailing list
>> Ubuntu-us-co at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>>
>>
>
> --
> Ubuntu-us-co mailing list
> Ubuntu-us-co at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>
>
--
Jim (Ubuntu geek extraordinaire)
----
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-us-co/attachments/20110820/e3141b6b/attachment-0001.html>
More information about the Ubuntu-us-co
mailing list