Support for Unicode files
ian.clatworthy at internode.on.net
Wed May 23 11:22:58 BST 2007
John Arbash Meinel wrote:
> A bigger question, though. What to do if you are merging a file which
> claims it is UTF-16 against a file which claims it is UTF-8? If we are
> opening this can of worms, it seems like people are going to start
> asking us to decode into full Unicode, do the merge, and then encode
> back into one of them. (either one is possible).
> I still feel like we shouldn't to transcoding on the fly (including
> line-endings). But I'm at least I'm starting to entertain the thought.
> Oh, and having something that could be switch on via plugin would
> probably satisfy me.
I think this is an area where we need to work from a clear set of
policies about the precise limits of Bazaar's problem space. At one
extreme, we can't simply treat files as byte streams like a filesystem
can. The basic text vs binary test that we and 99% of other tools use is
arguably effective in the Western world but so 1970's. :-) At the other,
we'll never be all things to all people and magically - semantically -
merge OpenOffice edits, say.
While usage may not be widespread yet, UTF-16 is being used by
developers and I'll like to see us support it either directly or
indirectly one day. Could we provide public hooks that plug-in authors
could tap into to allow "semantic merging" based on per-file properties?
I'm fine with taking a simple approach in the core as long as we allow
others to layer intelligence in order to address things like this.
More information about the bazaar