recursively count the words occurrence in the text files

Marius Gedminas marius at pov.lt
Thu Dec 30 18:49:45 UTC 2010


On Thu, Dec 30, 2010 at 10:34:54AM -0800, S Mathias wrote:
> I just can't google for it:
> 
> I'm searching for a "bash" "one liner" (awk, perl, or anything) for this: 
> 
> there are text files, in several directories: 
...
> *recursively count the words occurrence in the text files like: "word1 2"
> can anyone point to a howto/link? [re: i just can't google for it :\]

Sounds a bit like a homework exercise...

  $ grep -r -o -e '[a-z]+' . | sort | uniq -c

might do it.  I haven't tested it myself.  At the very least you'll need
to adjust the regexp to match your definition of "word", since
all-lowercase probably won't cut it.

The manual pages for grep, sort, and uniq are available here:

  http://manpages.ubuntu.com/manpages/maverick/en/man1/grep.1.html
  http://manpages.ubuntu.com/manpages/maverick/en/man1/sort.1.html
  http://manpages.ubuntu.com/manpages/maverick/en/man1/uniq.1.html

A good introduction to text processing with shell one-liners is
_Unix_for_poets_ by Kenneth Ward Church:

  http://sslmit.unibo.it/~baroni/compling04/UnixforPoets.pdf

Marius Gedminas
-- 
IBM motto: "If you can't read our assembly language, you must be
borderline dyslexic, and we don't want you to mess with it anyway"
		-- Linus Torvalds
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20101230/17d25fd5/attachment.sig>


More information about the ubuntu-users mailing list