[CoLoCo] bash question
Kevin Fries
kfries6 at gmail.com
Sat Aug 20 09:29:57 UTC 2011
Jim,
I wrote a quick easy Ruby script to do exactly what you want it to do.
This was not written to be the most efficient, but instead the easiest
to read. And in the "Teach a man to fish" school of thought, I figured
I would walk you through the script so any newbies out there can see how
this is done as well as give you a working script.
First like any linux script, declare your language
---------------------------------------------------------------------------
#!/usr/bin/ruby
---------------------------------------------------------------------------
Next If you want to skip the first line of each file, you need a flag
---------------------------------------------------------------------------
firstline = true
---------------------------------------------------------------------------
Now, you need to iterate through all your files. In Ruby, you simply
create a Dir object, and pass it a path. Now that you have a directory,
simply call the each method, and define a name to hold the filename...
in this case 'file'
---------------------------------------------------------------------------
Dir.new('./data').each do |file|
---------------------------------------------------------------------------
This could have also been done by declaring a variable, say 'd', to hold
the Dir object, then iterate on that. This did not buy us anything in
this case so I did not do that. But if I did, it would look like this
---------------------------------------------------------------------------
d = Dir.new('./data')
d.each do |file|
---------------------------------------------------------------------------
The next thing you want to do, is skip any files you don't want to
process. in this case I defined '.', '..', and any directories. If you
want your data in directories, it would be easy to define. I defined it
to purposely skip directories so you can create a 'NoProcess' directory
to keep files you don't want processed
---------------------------------------------------------------------------
next if file == '.'
next if file == '..'
next if File::Stat.new("./data/#{file}").directory?
---------------------------------------------------------------------------
Pretty simple so far? I hope so. Now that we have a filename that we
want to work with, lets open the file, and assign the file handle to a
variable I will call 'fh'
---------------------------------------------------------------------------
File.open("./data/#{file}") do |fh|
---------------------------------------------------------------------------
You will also notice I will not close this file in Ruby. By opening the
file, and doing a "do |varname|", (called passing a block in Ruby), the
file will close when the block is done processing... cool heh? No more
files left open by mistake. I love these new language structures.
Lets initialize your count
---------------------------------------------------------------------------
count = 0
---------------------------------------------------------------------------
and read lines out of the file until there are no more to read
---------------------------------------------------------------------------
while line = fh.gets
---------------------------------------------------------------------------
Skip what you read if it is the first line, and set your flag to
indicate that this has already been done
---------------------------------------------------------------------------
if firstline
firstline = false
next
end
---------------------------------------------------------------------------
All right, you have a file, you have opened it, you have read a line
from it, and it is not the first line. Lets parse it into the three
column values
---------------------------------------------------------------------------
(rev_user_text,page_title,linecount) = line.split
---------------------------------------------------------------------------
And add the third column (called linecount) to our total. Since your
file is a text file, Ruby will treat all data as strings. The to_i
simply forces the strint -> int conversion
---------------------------------------------------------------------------
count += linecount.to_i
---------------------------------------------------------------------------
OK, we are now done with our line, lets end this line, and go on till we
hit the end of file
---------------------------------------------------------------------------
end
---------------------------------------------------------------------------
Once End-Of-File is reached, we can print our results
---------------------------------------------------------------------------
puts "#{file}: #{count}"
---------------------------------------------------------------------------
If you wanted, you could have opened a file in the begininng and wrote
the to the file here. But by placing this out on the stdout, you can
use redirection to create a file, or other command line tools like awk,
sed, wc, etc on the output.
We are now at the end of processing that file, and we can loop back and
get any other files in that directory
---------------------------------------------------------------------------
end
---------------------------------------------------------------------------
Once all the files are processed, we are done
---------------------------------------------------------------------------
end
---------------------------------------------------------------------------
Pretty simple, here is the program all in one piece. You can simply put
it in a temp folder, place all the files you want to process in a
sub-folder called data and run it. Or, it should now be pretty easy to
alter this program to pull files from wherever you want. So, here is
the program in its entirety:
---------------------------------------------------------------------------
#!/usr/bin/ruby
firstline = true
Dir.new('./data').each do |file|
next if file == '.'
next if file == '..'
next if File::Stat.new("./data/#{file}").directory?
File.open("./data/#{file}") do |fh|
count = 0
while line = fh.gets
if firstline
firstline = false
next
end
(rev_user_text,page_title,linecount) = line.split
count += linecount.to_i
end
puts "#{file}: #{count}"
end
end
---------------------------------------------------------------------------
Enjoy
Kevin
More information about the Ubuntu-us-co
mailing list