Make a word list from a text
Wulfy
wulfmann at tiscali.co.uk
Sun Aug 3 07:42:27 UTC 2008
Donn wrote:
> On Saturday, 02 August 2008 05:52:46 Wulfy wrote:
>
>> I want to take a text file and extract all the words and sort them into
>> a unique list.
>>
> I gave it a go and this is the best I can do:
> cat myfile | sed "s/'//g" | tr -s '[:space:][:punct:]' "\n" | sort | uniq -c
>
> The sed bit is to remove single quotes so words like "didn't" don't
> become "didn" and "t". It then uses tr to replace spaces or punctuation with
> newlines and then out to sort and uniq.
>
> I find text parsing very hard to do. There seem to be corner-cases everywhere.
> What is a word really? How do you define it's edges? Ah well, HTH.
> \d
>
>
Wow! Yet another way to do it! I said there were a bazillion
ways...... :@D
Thanks, Donn! :@)
--
Blessings
Wulfmann
Wulf Credo:
Respect the elders. Teach the young. Co-operate with the pack.
Play when you can. Hunt when you must. Rest in between.
Share your affections. Voice your opinion. Leave your Mark.
Copyright July 17, 1988 by Del Goetz
More information about the kubuntu-users
mailing list