How can I extract sentenses from text documents
Wade Smart
wade at wadesmart.com
Fri Dec 23 02:45:05 UTC 2005
12222005 2039 GMT-5
What this is is the diary of a woman that her years and years of writing
was kept on a word processor and then later on a pc. She had hundreds
and hundreds of quotes that she came across - some of them pretty unique
in their meaning.
Most are simple *.txt documents and a few are MS Word that I open with OO.
Take this email for example. She writes as simple as we speak back and
forth. And then she was just drop in a quote, [QUOTE] The positive thing
about writing is that you connect with yourself in the deepest way, and
that's heaven. You get a chance to know who you are, to know what you
think. You begin to have a relationship with your mind. [/QUOTE] And
then keep talking from there.
I would like to pull them out and have them like this:
The positive thing about writing is that you connect with yourself in the deepest way, and that's heaven. You get a chance to know who you are, to know what you think. You begin to have a relationship with your mind.
The positive thing about writing is that you connect with yourself in the deepest way, and that's heaven. You get a chance to know who you are, to know what you think. You begin to have a relationship with your mind.
That way when they are saved to a normal text file, the owner can just
do what they like with them afterwards.
Ill read up on that awk and someone also mentioned Ruby. Thanks for
those suggestions.
Wade
Mike Bird wrote:
>On Thu, 2005-12-22 at 11:09, Wade Smart wrote:
>
>
>>Ok, this may be totally impossible but, I have about 1800 documents that
>>have sentences inside [QUOTE] and sometimes [QUOTE] [QUOTE] or [QUOTE]
>>[/QUOTE]. I don't know how many lines each document has - maybe 8 to
>>20k. Is there a way to copy all the sentences between the [QUOTE]
>>[QUOTE] or [QUOTE] [/QUOTE] to a new file?
>>
>>
>
>sed can probably do that, if the documents are text format rather
>than some word-processing format and depending upon line breaks
>and depending upon how you want the quotes to appear in the new
>file.
>
>There may also be an approach of converting the square brackets
>to angles and then using an XML tool.
>
>Why don't you post a small sample input file that covers all the
>relevant cases, together with how the output file should appear?
>
>--Mike Bird
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20051222/074ab4b1/attachment.html>
More information about the ubuntu-users
mailing list