Make a word list from a text

Wulfy wulfmann at
Sat Aug 2 07:02:05 UTC 2008

Mark A. Taff wrote:
>> Mark's program gave me a list of words, one to a line, I now need to
>> remove punctuation.  Brendan's program removed all the spaces but
>> otherwise left the rest of the text as it was,
> perl -e '$data = `cat ./pgadmin.log`; $data =~ s/[?\.\,\"\;\:\(\)\/\_\*\!]//g;  
> @words = split(/ /, $data); foreach $word (@words) { print "$word\n"; }'|
> sort|uniq
> This version will remove most punctuation, notably except apostrophe's.  You 
> start running into context problems: Is that apostrophe marking a plural 
> (mark's computer) or omitted character (ma'am, don't) or quoting ("blah," 
> said Mark).  Same applies to dashes and hyphenated words (self-defense).
> But, this will get you close.
> HTH,
> Mark
Wonderful!  Thanks so much!  :@)



Wulf Credo:
Respect the elders. Teach the young. Co-operate with the pack.
Play when you can. Hunt when you must. Rest in between.
Share your affections. Voice your opinion. Leave your Mark.
Copyright July 17, 1988 by Del Goetz

More information about the kubuntu-users mailing list