[MERGE REVIEW] Binary file handling
Aaron Bentley
aaron.bentley at utoronto.ca
Tue Apr 18 22:21:33 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jan Hudec wrote:
> On Tue, Apr 18, 2006 at 16:14:31 +1000, Martin Pool wrote:
>>Have you ever seen a UTF-16/UCS-2 source file in a tree? I know they
>>might occur on Windows but it seems unlikely even there. I suppose the
>>current diff code will (unknowningly) probably do the right thing with
>>them by seeing the end of lines.
It kinda will. It will treat \x00\n correctly, but not \x01\n, etc.
>>Possibly we would eventually want bzr to know about both the line
>>endings and the character encoding to handle this properly, much as a
>>text editor has "utf-8 with cr", "ucs-2 with crlf", etc.
>
>
> Yes and no. We have to be careful to avoid such files giving bzr nuts the way
> that gave them to clearcase.
...
> So the resume is, that we could have properties to tell diff what to
> *display*, but the storage should always deal on it's own.
Our storage system will handle any kind of file you feed it. My changes
are just about display and merge.
But the behaviour of weave merge on a UTF-16 file would be improved if
we correctly split on newlines, which would require us to detect the
file encoding.
>>>| Perhaps it would be worth adding a way to tell bzr "this is text/this
>>>| is binary" in the user-interface (and this means a meta-info in the
>>>| repository)?
>>>
>>>Martin felt that approach was too baroque, and that we should do it
>>>this way.
>>
>>Not so much "baroque" as: why make people set something if it can be
>>automatically detected,
I always conceived it as an override for the automatic detection.
>>and what should happen if it's set wrongly.
It would probably be more useful for forcing files as binary (e.g.
uuencoded files) rather than as a way of forcing files to be text.
>>Suppose the binary flag is not set, but the file is actually binary -
>>would you want to display the binary garbage to the terminal, or do a
>>line-wise merge? It seems to me that you would not; diff ought to check
>>whether the file contents are actually safe to display, regardless of
>>whether the user said it was binary or not.
If there were a significant number of text formats that used NUL,
perhaps we should. We can wait and see, I think.
>>Conversely if you (perhaps
>>incorrectly) marked it as binary you might still want to display the
>>diff.
I think if the user makes the effort to mark it as binary, it probably
is, even if it contains no NULs.
>>A single "binary" bit is probably not quite enough: you might have files
>>(e.g. vimrc) containing wierd characters that are mergeable as text, or
>>plain text files that should never be automatically merged.
I think we're in agreement here.
>>It seems to me the first thing is to make the internal operations have
>>options to treat them as either binaries or text, and to connect those
>>to either heuristics or user preferences expressed at the time. (For
>>example 'bzr diff --text' to disable detection of binaries.) So this is
>>a good step.
Sounds reasonable.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFERVhd0F+nu1YWqI0RAtb0AJ43J+2mtROLDkVFzLKpURuC1RT/dgCfcI/G
satEODwJdQxUimu/XkL5CMo=
=Z5li
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list