Support for Unicode files

Wed May 23 11:22:58 BST 2007

John Arbash Meinel wrote:
> A bigger question, though. What to do if you are merging a file which 
> claims it is UTF-16 against a file which claims it is UTF-8? If we are 
> opening this can of worms, it seems like people are going to start 
> asking us to decode into full Unicode, do the merge, and then encode 
> back into one of them. (either one is possible).
>
> I still feel like we shouldn't to transcoding on the fly (including 
> line-endings). But I'm at least I'm starting to entertain the thought.
>
> Oh, and having something that could be switch on via plugin would 
> probably satisfy me.
>
I think this is an area where we need to work from a clear set of 
policies about the precise limits of Bazaar's problem space. At one 
extreme, we can't simply treat files as byte streams like a filesystem 
can. The basic text vs binary test that we and 99% of other tools use is 
arguably effective in the Western world but so 1970's. :-) At the other, 
we'll never be all things to all people and magically - semantically - 
merge OpenOffice edits, say.

While usage may not be widespread yet, UTF-16 is being used by 
developers and I'll like to see us support it either directly or 
indirectly one day. Could we provide public hooks that plug-in authors 
could tap into to allow "semantic merging" based on per-file properties? 
I'm fine with taking a simple approach in the core as long as we allow 
others to layer intelligence in order to address things like this.

Ian C.