[MERGE REVIEW] Binary file handling
Aaron Bentley
aaron.bentley at utoronto.ca
Tue Apr 18 16:40:45 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> On 16 Apr 2006, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
>> Text merges won't be performed on binary files
> We may at some point want to allow you to force it by remerging the file.
Okay, but diff3 cannot operate on binary files.
>> It is tempting to test for ESC as well, since that would prevent bzr
>> diff from messing up the user's terminal. Text files containing ESC are
>> relatively rare, but they do exist, e.g. shell scripts that set terminal
>> colors.
>
>
> If people have them it's probably OK to actually display them?
Diffs intended for human consumption could, err, "escape" the ESC
character (as '\x5E['), to prevent messing up the terminal, but this
requires us to distinguish between diff purposes. OTOH, there is
already some push for that.
>> === added file 'b/bzrlib/textfile.py'
> Could do with a module docstring, e.g. "Utilities for
> distinguishing binary from text files".
Ok.
>> + check_text_lines(oldlines)
>> + check_text_lines(newlines)
>>
>> ud = difflib.unified_diff(oldlines, newlines,
>> fromfile=old_filename+'\t',
>
>
> It might be good to have a parameter allowing you to see the diff even
> if it looks binary - or will difflib fail?
I believe it will pass. Difflib seems to handle non-text sequences
adequately.
>> +class BinaryFile(BzrNewError):
>> + """File is binary but should be text."""
>
>
> Perhaps it should include a string for the filename?
In many cases, there's no precise filename, i.e. when doing a diff,
there's no physical path for the old text. In some cases it would
require us to calculate and pass in a filename that would rarely be
used. And there are no known cases in which this error is unhandled, so
a filename would never be displayed..
>> + except BinaryFile:
>> + print >> output_to, "Binary files differ"
>
>
> Should this include the filenames? I suppose it will come after the
> heading and so be clear from that?
It should be clear from that. Strict imitation of diff(1) would require
us to include filenames, so I'll do that.
>> +from textfile import check_text_lines
>
>
> Should be bzrlib.textfile.
Doh!
> In several of these tests you check whether a \0 after the first kb will
> be undetected. Is this really a good test -- is it something that
> should be a *requirement* of the program in future? It seems to me that
> it should not be tested; rather you should just make sure that a
> file with no nuls at all will be OK.
I thought that it would be best to test the routine as thoroughly as
possible, and update the tests if we changed that behaviour. I didn't
see it as imposing a requirement on future routines. But I can change this.
So in sum, I'm leaving BinaryFile as is, and taking your other suggestions.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFERQh90F+nu1YWqI0RAnMUAJ4/zdYs5sLHPxLqTOu/mw8DnsuBOQCfXmnG
lp3A4hIyWPxXaKi5qxsT0dE=
=VumU
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list