How can I extract sentenses from text documents
Mike Bird
mgb-ubuntu at yosemite.net
Thu Dec 22 20:47:05 UTC 2005
On Thu, 2005-12-22 at 11:09, Wade Smart wrote:
> Ok, this may be totally impossible but, I have about 1800 documents that
> have sentences inside [QUOTE] and sometimes [QUOTE] [QUOTE] or [QUOTE]
> [/QUOTE]. I don't know how many lines each document has - maybe 8 to
> 20k. Is there a way to copy all the sentences between the [QUOTE]
> [QUOTE] or [QUOTE] [/QUOTE] to a new file?
sed can probably do that, if the documents are text format rather
than some word-processing format and depending upon line breaks
and depending upon how you want the quotes to appear in the new
file.
There may also be an approach of converting the square brackets
to angles and then using an XML tool.
Why don't you post a small sample input file that covers all the
relevant cases, together with how the output file should appear?
--Mike Bird
More information about the ubuntu-users
mailing list