Support for Unicode files

Dennis Benzinger Dennis.Benzinger at gmx.net
Wed May 23 16:16:28 BST 2007


Am Wed, 23 May 2007 09:52:42 -0400
schrieb Aaron Bentley <aaron.bentley at utoronto.ca>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Dennis Benzinger wrote:
> [...]
> > MIME types for which no plugin is registered are treated as
> > unmergeable.
> 
> That doesn't sound okay.  We should attempt merging on all text files,
> but we can't predict what MIME types may be used.

When adding a file Subversion tries to guess if it's text or else sets
its mime-type property to application/octet-stream. Afterwards
everything whose MIME type doesn't start with text/ is considered
binary and is not merged. 

Attempting to merge all text files won't work because not every file
format which consists of textual data can be sensibly merged. For
example you can't (easily) merge a OpenOffice document or the ASCII
variant of PDF.

But these cases can be handled by plugins which use the proposed MIME
type (Content-type) registry.

> >> A bigger question, though. What to do if you are merging a file
> >> which claims it is UTF-16 against a file which claims it is UTF-8? 
> >> [...]
> > 
> > Refuse to merge.
> 
> We would typically handle that as a contents conflict.  foo.BASE,
> foo.THIS and foo.OTHER would be dumped in the working tree, and the
> user can sort it out for themself.

That's good.

> But I should point out that two files may have the same MIME type yet
> be in different encodings.
> [...]

Not if we don't only use the MIME type but a complete Content-Type
property like the one RFC 2046
<http://www.rfc-editor.org/rfc/rfc2046.txt> defines. Then we can use
the charset parameter (section 4.1.2) to specify the encoding.



Dennis Benzinger



More information about the bazaar mailing list