Text processing tools and/or languages
Chris Green
cl at isbd.net
Thu Jan 20 15:42:00 UTC 2022
I'm looking for tools (if there are any) for processing a text file
line by line sequentially.
As it goes through the file it needs to make decisions based on the
contents of the line(s) of text and change its state as it goes.
The decisions it makes depend on the state it's in.
Basically I'm processing some (fairly) fixed format messages from a
forum to remove some matched header and trailer lines, modify and
output a few other matched lines and simply output the body of the
message.
The (most) difficult bit is removing blank lines before something.
E.g. we have a message that starts:-
A new topic has been created on the forum
Message Subject : weed webinar 31 January
Category : Waterways Continental Europe
Posted by : Fred Bloggs
I want to delete everything up to and including the blank line after 'Message Subject'
then keep (i.e. output) the 'Category' line and the 'Posted by' lines without the blank
lines in between.
I can't delete all blank lines because I want to retain spacing
in the message body later. So I need to be able to do things
like deleting blank lines unless I am in the message body.
Are there specific tools for doing this sort of thing or should
I just write a program (probably in Python) that reads lines,
does actions as required and remembers its state as it goes?
I got some of the way using sed but it's very difficult to 'delete
the line before XXXX' with sed. It *might* be that awk would be
better but I don't see it handling the state/sequential bit any
better than sed.
Any/all advice would be very welcome.
--
Chris Green
More information about the ubuntu-users
mailing list