[CoLoCo] bash question

Jim Hutchinson jim at ubuntu-rocks.org
Sat Aug 20 15:29:52 UTC 2011


David,

These files have no duplicates (that's a whole 'nother can of worms). These
are single user files and I need to do some basic descriptive stats on them
like average total number of edits, average number of articles edited, etc.
Once I have the line count (article count) and the averages for all 1000
then I can work with the data in a spreadsheet for the rest.

Do I run PHP the same as python or ruby? i.e. php script.php *.txt >
outputfile.txt

Or similar?

Btw, what happens to files if they have a space in the title? Some do.

Thanks,
Jim

On Sat, Aug 20, 2011 at 8:51 AM, David Overcash <funnylookinhat at gmail.com>wrote:

> No - the text files should mostly be ok... the one I'm working with (
> AED.txt ) seems to parse ok through PHP.
>
> Hey you Python lovers!  Your language sucks!  And you've led Jim down a
> rabbit-hole of doom!  Totally kidding, but I take my points when I can... ;)
>
> I'm a bit confused - but I'm on my way out for the day so i figured I'd
> send this along just in case you could use it... this script is finding
> AED.txt to have no duplicate lines in the second column ( i.e. the count for
> each one is only 1 ) - is that expected?  If you have a file where there are
> definitely duplicate values across rows for column 2, then I'd be better
> able to know if this is working correctly...
>
> <?
>
> $col2count = array();
> $col3sum = 0;
>
> // Replace /home/funnylookinhat/Temp_Dev/JIM with the path to your .txt
> files
> if ($directory = opendir('/home/funnylookinhat/Temp_Dev/JIM')) {
>  while( $file = readdir($directory) ) {
>  if( strpos($file,'.txt') !== FALSE ) {
>  $file_array = file($file);
>  foreach( $file_array as $file_line ) {
>  $file_line_array = explode("\t",$file_line);
>  if ( isset( $col2count[trim($file_line_array[1])] ) ) {
>  $col2count[trim($file_line_array[1])] =
> $col2count[trim($file_line_array[1])] + 1;
> } else {
>  $col2count[trim($file_line_array[1])] = 1;
> }
>  $col3sum += intval(trim($file_line_array[2]));
>  // Uncomment this if you want to see what each read line is.
>  // echo
> trim($file_line_array[0])."\t".trim($file_line_array[1])."\t".trim($file_line_array[2])."\n";
>  }
>  }
>  }
>  // print the lines and their counts.
> foreach( $col2count as $line => $count ) {
>  echo $line.','.$count."\n";
>  }
>  // Print the sum - you'll probably want to comment this out and pipe the
> output into a .csv file
>  echo "COL3SUM: ".$col3sum."\n";
> } else {
> echo "Could not open directory.\n";
> }
> ?>
>
> On Sat, Aug 20, 2011 at 8:22 AM, Jim Hutchinson <jim at ubuntu-rocks.org>wrote:
>
>> Thanks David,
>>
>> It still does the same error with just using one specific file. Is there
>> something in the format of the data in the file that it doesn't like? I
>> can't change that as I'd have to manually edit them all.
>>
>> Jim
>>
>>
>> On Sat, Aug 20, 2011 at 8:08 AM, David Overcash <funnylookinhat at gmail.com
>> > wrote:
>>
>>> The error is coming from the file where it reads a file... and because it
>>> runs for a while, my guess is that it's looping a few files and then getting
>>> the error on a specific one.  It definitely should be only reading .txt
>>> files with those params.
>>>
>>> Go ahead and try running the program with just one text file to verify it
>>> works correctly.
>>>
>>> i.e.
>>> python averages.py AED.txt
>>>
>>>
>>> On Sat, Aug 20, 2011 at 8:02 AM, Jim Hutchinson <jim at ubuntu-rocks.org>wrote:
>>>
>>>> David,
>>>>
>>>> It errors out on the .py script itself. Shouldn't the *.txt tell it to
>>>> skip any non .txt file? Guess I need to point it to a dir with just the
>>>> files.
>>>>
>>>> Thanks.
>>>>
>>>> Kevin,
>>>>
>>>> As soon as I figure out if I have ruby on my laptop I'll give that a
>>>> try. Thanks for the help and lesson. Much easier to understand what it's
>>>> doing that way.
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>> --
>> Ubuntu-us-co mailing list
>> Ubuntu-us-co at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>>
>>
>
> --
> Ubuntu-us-co mailing list
> Ubuntu-us-co at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-us-co
>
>


-- 
Jim (Ubuntu geek extraordinaire)
----
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-us-co/attachments/20110820/37ee6592/attachment-0001.html>


More information about the Ubuntu-us-co mailing list