How to format text for normal reading

Hal Burgiss hal at burgiss.net
Sat Nov 6 15:04:04 UTC 2010


On Sat, Nov 6, 2010 at 10:44 AM, Aart Koelewijn <aart at mtack.xs4all.nl>wrote:

> > � "  ’
>
> It looks like you have a problem with character encoding. These can
> usually be tackeld with the program "recode". The &quot gives the
> impression there is stil some html character encoding in place. To change
> this to UTF-8 you could use "recode HTML..UTF-8 file". You can do much
> more with recode, "man recode" for all possibilities.
>
>
Another tool for this is 'iconv'. In any case, there certainly looks like a
character encoding issue, which should be dealt with before any other text
processing. One issue at least with iconv (I have not tried recode), you
pretty much have to know the encoding of the input file. A proper HTML file
should have the encoding in the HEAD. If coming from a windows system, the
best bet (in my experience) is a windows-1252 encoding (in case you have to
start guessing). Output should be UTF8.

-- 
Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20101106/968a49ad/attachment.html>


More information about the ubuntu-users mailing list