[MERGE REVIEW] Binary file handling

Aaron Bentley aaron.bentley at utoronto.ca
Tue Apr 18 16:40:45 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 16 Apr 2006, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
>> Text merges won't be performed on binary files

> We may at some point want to allow you to force it by remerging the file.

Okay, but diff3 cannot operate on binary files.

>> It is tempting to test for ESC as well, since that would prevent bzr
>> diff from messing up the user's terminal.  Text files containing ESC are
>> relatively rare, but they do exist, e.g. shell scripts that set terminal
>> colors.
> 
> 
> If people have them it's probably OK to actually display them?

Diffs intended for human consumption could, err, "escape" the ESC
character (as '\x5E['), to prevent messing up the terminal, but this
requires us to distinguish between diff purposes.  OTOH, there is
already some push for that.

>> === added file 'b/bzrlib/textfile.py'
> Could do with a module docstring, e.g. "Utilities for
> distinguishing binary from text files".

Ok.

>> +    check_text_lines(oldlines)
>> +    check_text_lines(newlines)
>>
>>      ud = difflib.unified_diff(oldlines, newlines,
>>                                fromfile=old_filename+'\t',
> 
> 
> It might be good to have a parameter allowing you to see the diff even
> if it looks binary - or will difflib fail?

I believe it will pass.  Difflib seems to handle non-text sequences
adequately.

>> +class BinaryFile(BzrNewError):
>> +    """File is binary but should be text."""
> 
> 
> Perhaps it should include a string for the filename?

In many cases, there's no precise filename, i.e. when doing a diff,
there's no physical path for the old text.  In some cases it would
require us to calculate and pass in a filename that would rarely be
used.  And there are no known cases in which this error is unhandled, so
a filename would never be displayed..

>> +        except BinaryFile:
>> +            print >> output_to, "Binary files differ"
> 
> 
> Should this include the filenames?  I suppose it will come after the
> heading and so be clear from that?

It should be clear from that.  Strict imitation of diff(1) would require
us to include filenames, so I'll do that.

>> +from textfile import check_text_lines
> 
> 
> Should be bzrlib.textfile.

Doh!

> In several of these tests you check whether a \0 after the first kb will
> be undetected.  Is this really a good test -- is it something that
> should be a *requirement* of the program in future?  It seems to me that
> it should not be tested; rather you should just make sure that a
> file with no nuls at all will be OK.

I thought that it would be best to test the routine as thoroughly as
possible, and update the tests if we changed that behaviour.  I didn't
see it as imposing a requirement on future routines.  But I can change this.

So in sum, I'm leaving BinaryFile as is, and taking your other suggestions.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFERQh90F+nu1YWqI0RAnMUAJ4/zdYs5sLHPxLqTOu/mw8DnsuBOQCfXmnG
lp3A4hIyWPxXaKi5qxsT0dE=
=VumU
-----END PGP SIGNATURE-----




More information about the bazaar mailing list