[MERGE][328007] Fixed problem with `log -p` and non-ascii content of files: show_diff should write the diff to exact [stdout] stream.

Alexander Belchenko bialix at ukr.net
Thu Mar 12 06:33:58 GMT 2009


James Westby пишет:
> On Wed, 2009-03-11 at 11:03 +0200, Alexander Belchenko wrote:
>> This patch provides fix for bug #328007: log -p on non-ascii files crashed with unicode error.
>> I wrote this patch by Ian request (actually I wrote 2 variants of patch).
>> Now I've added simple smoke test. This test failed with unicode error without my patch
>> and succeed with it.
> 
> Thanks for working on this, I think it's an important bug to fix.
> 
>> +        # 'exact' stream used to show diff, it should print content 'as is'
>> +        # and should not try to decode/encode it to unicode to avoid bug #328007
>> +        self.to_exact_file = getattr(to_file, 'stream', to_file)
> 
> This seems to change the contract of the method, does it work if I pass
> in a StringIO to receive the diff?

Excuse my ignorance: what's "contract of the method"?

This patch don't change behavior of existing code. But my first patch does change,
because require only bytestream, while this patch accepts both bytestream and
encoded stream.

Neither StringIO object, nor sys.stdout nor any other file object don't have stream attribute.
So asking on your question: "does it work if I pass in a StringIO to receive the diff?"
Answer: "Yes, it works fine".

> It seems to me like your first patch, while a little more verbose, 
> didn't rely on an undocumented feature.

Well, one of the most quoted sentence I heard when start learning Python was:
"Use the source Luke". And the main thing I've learned in high school was:
sometimes things are not documented properly.

see codecs.py in your python libs installation:

#
# The StreamWriter and StreamReader class provide generic working
# interfaces which can be used to implement new encoding submodules
# very easily. See encodings/utf_8.py for an example on how this is
# done.
#

class StreamWriter(Codec):

    def __init__(self, stream, errors='strict'):

        """ Creates a StreamWriter instance.

            stream must be a file-like object open for writing
            (binary) data.

            The StreamWriter may use different error handling
            schemes by providing the errors keyword argument. These
            parameters are predefined:

             'strict' - raise a ValueError (or a subclass)
             'ignore' - ignore the character and continue with the next
             'replace'- replace with a suitable replacement character
             'xmlcharrefreplace' - Replace with the appropriate XML
                                   character reference.
             'backslashreplace'  - Replace with backslashed escape
                                   sequences (only for encoding).

            The set of allowed parameter values can be extended via
            register_error.
        """
        self.stream = stream
        self.errors = errors


As you see self.stream is a public attribute. If Python devs have wanted to hide it
from usage, I believe there should be

	self._stream = stream

Anyway.
This patch is much better, because it don't change existing behavior for plain log.
First (verbose) patch does change and I don't think I like it.
This change is not visible enough on Linux, but will be visible on Windows.

I can change getattr to following code (if we want to be really paranoid):

	if isinstance(to_file, codecs.StreamWriter):
		self.to_exact_file = to_file.stream
	else:
		self.to_exact_file = to_file





More information about the bazaar mailing list