Bazaar/Windows encoding trouble.

Tue Mar 3 16:31:06 GMT 2009

Hi All,

I'm trying to get a workable environment that allows both Windows workstations 
and Linux workstations to share the same code base, stored in a Bazaar 
repository. I've just mostly fixed the CRLF / LF line endings debacle and am 
now trying to get both Windumbs and Linux to agree on a suitable encoding for 
the files. As usual the Linux side is no problem. The windows side, also as 
usual, proves to be a horrible experience and timesink.

Currently most files are physically stored as UTF-8 files since we use 
non-ascii characters throughout the code. Editing these files with Windows 
crudware usually causes problems since non-ascii characters are corrupted 
(because many tools use the abomination that is the "default codepage" to 
load 8-bit files). I would like to see to it that when people edit files from 
windows with whatever software these files are not damaged.

After lots of experimentation I found out that Windows of course cannot 
natively specify UTF-8 as the default code page (if you try the %&*%(&%& 
piece of crap does not boot anymore!?!? How's that for an error message!?).

I then tried to use iso-8859-15 encoding for at least the files that are 
edited mostly by Windows sh*tware. This very standard encoding corresponds to 
code page 28605 in Windows (NOT 1252 - that is iso-8859-1 plus Microsoft 
extra's; it does not properly encode the euro sign and has extra characters 
in places where iso-8859-1 has nothing. This makes it unusable).

But when I use this encoding (by simply and easily editing the bloody registry 
to get Windows to use it - nice toolset there) most Windows programs properly 
display iso-8859-15 encoded files, but bazaar reports an error when started:

PS C:\> bzr version
bzr: warning: unknown encoding cp28605. Continuing with ascii encoding.
Bazaar (bzr) 1.12
  Python interpreter: c:\tools\python25.dll 2.5.2
  Python standard library: c:\tools\lib\library.zip
  bzrlib: c:\tools\lib\library.zip\bzrlib
  Bazaar configuration: C:\Documents and Settings\jal\Application 

This is already a bit strange since Windows has a separate code page (which 
does not allow 28605 because otherwise things would be simple) for command 
line crud, changed with the chcp command. I think "bzr" uses the Windows 
encoding because it's a win32 console app.

Is this warning a problem? Is there a way to prevent it? Has anyone any 
experience in shared code/encoding between Unix and Windows, preferrably in 
an encoding that has >255 characters? Would Windows "Native" encoding be 
usable (which seems to be UTF-16 )? Any experiences there?

Any help or thoughts would be welcome...

Thanks,

(a very frustrated) Frits Jalvingh