How can I extract sentenses from text documents

Wade Smart wade at wadesmart.com
Fri Dec 23 02:45:05 UTC 2005


12222005 2039 GMT-5

What this is is the diary of a woman that her years and years of writing 
was kept on a word processor and then later on a pc. She had hundreds 
and hundreds of quotes that she came across - some of them pretty unique 
in their meaning.

Most are simple *.txt documents and a few are MS Word that I open with OO.

Take this email for example. She writes as simple as we speak back and 
forth. And then she was just drop in a quote, [QUOTE] The positive thing 
about writing is that you connect with yourself in the deepest way, and 
that's heaven. You get a chance to know who you are, to know what you 
think. You begin to have a relationship with your mind. [/QUOTE]  And 
then keep talking from there.

I would like to pull them out and have them like this:

The positive thing about writing is that you connect with yourself in the deepest way, and that's heaven. You get a chance to know who you are, to know what you think. You begin to have a relationship with your mind. 

The positive thing about writing is that you connect with yourself in the deepest way, and that's heaven. You get a chance to know who you are, to know what you think. You begin to have a relationship with your mind. 


That way when they are saved to a normal text file, the owner can just 
do what they like with them afterwards.


Ill read up on that awk and someone also mentioned Ruby. Thanks for 
those suggestions.

Wade


Mike Bird wrote:

>On Thu, 2005-12-22 at 11:09, Wade Smart wrote:
>  
>
>>Ok, this may be totally impossible but, I have about 1800 documents that 
>>have sentences inside [QUOTE] and sometimes [QUOTE] [QUOTE] or [QUOTE] 
>>[/QUOTE]. I don't know how many lines each document has - maybe 8 to 
>>20k. Is there a way to copy all the sentences between the [QUOTE] 
>>[QUOTE] or [QUOTE] [/QUOTE] to a new file?  
>>    
>>
>
>sed can probably do that, if the documents are text format rather
>than some word-processing format and depending upon line breaks
>and depending upon how you want the quotes to appear in the new
>file.
>
>There may also be an approach of converting the square brackets
>to angles and then using an XML tool.
>
>Why don't you post a small sample input file that covers all the
>relevant cases, together with how the output file should appear?
>
>--Mike Bird
>
>
>  
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20051222/074ab4b1/attachment.html>


More information about the ubuntu-users mailing list