Make a word list from a text

Wulfy wulfmann at tiscali.co.uk
Sat Aug 2 07:02:05 UTC 2008


Mark A. Taff wrote:
>> Mark's program gave me a list of words, one to a line, I now need to
>> remove punctuation.  Brendan's program removed all the spaces but
>> otherwise left the rest of the text as it was,
>>     
>
>
> perl -e '$data = `cat ./pgadmin.log`; $data =~ s/[?\.\,\"\;\:\(\)\/\_\*\!]//g;  
> @words = split(/ /, $data); foreach $word (@words) { print "$word\n"; }'|
> sort|uniq
>
> This version will remove most punctuation, notably except apostrophe's.  You 
> start running into context problems: Is that apostrophe marking a plural 
> (mark's computer) or omitted character (ma'am, don't) or quoting ("blah," 
> said Mark).  Same applies to dashes and hyphenated words (self-defense).
>
> But, this will get you close.
>
> HTH,
>
> Mark
>
>   
Wonderful!  Thanks so much!  :@)

-- 
Blessings

Wulfmann

Wulf Credo:
Respect the elders. Teach the young. Co-operate with the pack.
Play when you can. Hunt when you must. Rest in between.
Share your affections. Voice your opinion. Leave your Mark.
Copyright July 17, 1988 by Del Goetz





More information about the kubuntu-users mailing list