Text processing tools and/or languages
Robert Heller
heller at deepsoft.com
Thu Jan 20 16:08:22 UTC 2022
Traditionally, there is 'awk', which can do this sort of thing. It is actually
a *very simple* programming language implement *simple* procedures to process
text files line-by-line.
Originally, Perl would be the next level up in text file processing.
Other options include using sed (possible several passes using pipelines), or
scripts written in Tcl or Python.
man awk
man perl
man sed
man tcl
At Thu, 20 Jan 2022 15:42:00 +0000 "Ubuntu user technical support,? not for general discussions" <ubuntu-users at lists.ubuntu.com> wrote:
>
> I'm looking for tools (if there are any) for processing a text file
> line by line sequentially.
>
> As it goes through the file it needs to make decisions based on the
> contents of the line(s) of text and change its state as it goes.
> The decisions it makes depend on the state it's in.
>
> Basically I'm processing some (fairly) fixed format messages from a
> forum to remove some matched header and trailer lines, modify and
> output a few other matched lines and simply output the body of the
> message.
>
> The (most) difficult bit is removing blank lines before something.
>
> E.g. we have a message that starts:-
>
> A new topic has been created on the forum
>
> Message Subject : weed webinar 31 January
>
> Category : Waterways Continental Europe
>
> Posted by : Fred Bloggs
>
>
> I want to delete everything up to and including the blank line after 'Message Subject'
> then keep (i.e. output) the 'Category' line and the 'Posted by' lines without the blank
> lines in between.
>
> I can't delete all blank lines because I want to retain spacing
> in the message body later. So I need to be able to do things
> like deleting blank lines unless I am in the message body.
>
> Are there specific tools for doing this sort of thing or should
> I just write a program (probably in Python) that reads lines,
> does actions as required and remembers its state as it goes?
>
> I got some of the way using sed but it's very difficult to 'delete
> the line before XXXX' with sed. It *might* be that awk would be
> better but I don't see it handling the state/sequential bit any
> better than sed.
>
> Any/all advice would be very welcome.
>
--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
heller at deepsoft.com -- Webhosting Services
More information about the ubuntu-users
mailing list