How to format text for normal reading

user1 bqz69 at telia.com
Sat Nov 6 08:58:43 UTC 2010


I tried to do this:

for file in *.html; do html2text -o "${file%.*}.txt" "$file" ; done

I found it here: http://commandline.org.uk/command-line/converting-html-
to-text/

That works fine, but when I then cat all the single text files into one 
big text file I still need to format this big file, to make it readable.

So my problem is not really only a html problem, but how to make any text 
file which is badly formatted readable.
That is to get each paragraph stand out with full lines ended with a dot 
as well as strange charachers/charachter-phrases removed.

Here follows 3 examples of some charachters I want removed:

� "  ’ 

 









More information about the ubuntu-users mailing list