[CoLoCo] bash question

Neal McBurnett neal at bcn.boulder.co.us
Fri Aug 19 22:21:53 UTC 2011


This is actually a good example of why a modern language is important
to have in your toolbelt.  It just isn't possible in general to decide
what to do with a non-ascii character unless you know which character
encoding is being used.  Python gives you lots of tools to help with
that, which bash and awk don't.

I'm guessing that on the line that says
    data = open('big2.log')

you want to declare the encoding of the file.  What kind of file is
it, from where?

Here is one guess, if that was an "a-circumflex" character, using the
most popular western european encoding:

 data = open('big2.log', encoding='latin-1')

See the python doc Unicode Howto: http://docs.python.org/howto/unicode.html

Neal McBurnett                 http://neal.mcburnett.org/

On Fri, Aug 19, 2011 at 04:07:52PM -0600, Jim Hutchinson wrote:
> Joey,
> 
> Tried this in Python using files.txt in place of big2.log (shouldn't matter
> what I call it, right?) and got this error
> 
> Non-ASCII character '\xc2' in file test.sh on line 6, but no encoding declared;
> 
> I copied and pasted your script as written and saved to test.sh and ran it. I
> used full path in the files.txt file.
> 
> Any ideas?
> 
> Thanks,
> Jim
> 
> On Fri, Aug 19, 2011 at 3:08 PM, Joey Stanford <joey at canonical.com> wrote:
> 
>     I think the easier way is going to be with awk ... but here's a python
>     program that's roughly equivalent...just not looking for the 3rd field
> 
>     #! /usr/bin/env python
> 
>     data = open('big2.log')
>     totals = {}
>     for line in data:
>        line = line.strip()
>        if line:
>            pageid = line.split()[1]
>            pagecount = int(line.split()[0])
>            if pageid in totals:
>               totals[pageid] += pagecount
>            else:
>               totals[pageid] = pagecount
> 
>     for key in totals:
>            print totals[key], key
> 
> 
>     On Fri, Aug 19, 2011 at 14:45, Jim Hutchinson <jim at ubuntu-rocks.org> wrote:
>     > Wondering if any of you script gurus can help with a small problem. I
>     have
>     > several text files containing 3 columns. I was to count the number of
>     > occurrences of the text in column 2 (or just count the lines) and sum
>     column
>     > 3 which is a number. I know how to do the latter with something like
>     >
>     > #!/bin/bash
>     >
>     > file="/home/test/file1.txt"
>     > cat ${file} | \
>     > while read name article count
>     > do
>     > sum=$(($sum + $count ))
>     > echo "$sum"
>     > done
>     >
>     > Although that prints each sum as it goes rather than just the final sum.
>     > I'm not sure how to count text (basically counting the lines that contain
>     > the numbers would work the same). Also, because each file has a header
>     row
>     > it's giving errors so I need to tell it to skip row 1.
>     > Finally, I want to automate the input of each file so having it read the
>     > list of text files from somewhere, process the file, output to a new file
>     > amending each time, and then repeat with the next one until all files are
>     > done.
>     > Any ideas?
>     > Thanks.
>     > --
>     > Jim (Ubuntu geek extraordinaire)
>     > ----
>     > Please avoid sending me Word or PowerPoint attachments.
>     > See http://www.gnu.org/philosophy/no-word-attachments.html
>     >
>     > --
>     > Ubuntu-us-co mailing list
>     > Ubuntu-us-co at lists.ubuntu.com
>     > Modify settings or unsubscribe at:
>     > https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>     >
>     >
> 
>     --
>     Ubuntu-us-co mailing list
>     Ubuntu-us-co at lists.ubuntu.com
>     Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/
>     listinfo/ubuntu-us-co
> 
> 
> 
> 
> --
> Jim (Ubuntu geek extraordinaire)
> ----
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html

> -- 
> Ubuntu-us-co mailing list
> Ubuntu-us-co at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co




More information about the Ubuntu-us-co mailing list