Ubuntu encodings

Joe dev at freedomcircle.net
Tue Aug 15 01:20:06 UTC 2006


Hi,

I've just migrated from FreeBSD to Ubuntu 6.06 (after about one month on
FreeBSD--previously was on WinXP [actually it still dual boots there]).
I had a problem with reloading a PostgreSQL 8.1 database from BSD (it
failed to load several tables with messages about invalid characters).
Looking at the database setup I found all the databases had been created
with UTF-8 encoding, although I do not recall any indication in the
setup or configuration that implied or stated that UTF-8 was going to be
used.  Since I wanted to move forward, I chose to recreate the database
with LATIN1 encoding.  This time the reload was successful and a PHP
application that accesses the database had no problem displaying its
pages.

However, when I tried to add a record, I got "ERROR: could not find
tsearch config by locale".  The table in question uses the PostgreSQL
tsearch2 module and the default locale (which I presume was reloaded
from BSD and XP) is "C" (or SQL_ASCII in PG terms).  I did a preliminary
check and it seems I need to add LATIN1 as a special encoding for
tsearch2 or make LATIN1 the default encoding.  It also appears like it
would be easier if I recreated the entire PG installation with either
LATIN1 encoding (or SQL_ASCII).

I realize most of the above is not really Ubuntu-related and are more
appropriate for a PG list.  However, I wanted to provide this background
to ask the question "Why would I want to stay with UTF-8?".  I'll be
glad to read any document that someone points me to that explains why
Ubuntu chooses UTF-8 as the default, apparently for everything.  For
example, Firefox insists on using "Unicode (UTF-8)" even after I change
it to Western (ISO-8859-1) so that I can view Spanish n-tilde and other
characters from the database.  I presume this is because 'locale' says
LC_TYPE is en_US.UTF-8, but this is in spite of the HTML page having a

<?xml version="1.0" encoding="ISO-8859-1"?>

right at the beginning.

I'm sorry for being so long winded on my first post.  I did some
research in the Ubuntu wiki beforehand and found the page on LocaleConf
but the comments were not very helpful.  For example, the author says "A
good rule is to choose utf-8 locales," but does not provide any reason
for that being good.  The section "For Anti-UTF-8 people" also seems to
assume that there are two kinds of people:  those for UTF-8 and those
against it, without no one in between having any doubts as to why one
would rather stay with ISO-8859-1 if one has little or no interaction
with other encodings.

I would appreciate any help or pointers to further reading.

Joe





More information about the ubuntu-users mailing list