UnicodeDecode error while printing line (#539258)

Tue Mar 16 16:01:26 GMT 2010

Alexander Belchenko пишет:
> Parth Malwankar пишет:
>> On Tue, Mar 16, 2010 at 1:57 PM, Alexander Belchenko <bialix at ukr.net> 
>> wrote:
>>> Alexander Belchenko пишет:
>>>>
>>>> To write lines of the versioned files you need to use
>>>> encoding_type = 'exact'
>>>>
>>>> But for writing [possible] unicode text to the terminal you need to use
>>>> 'replace' in most situations.
>>>>
>>>> So without looking at the code in your plugin I'd say you need 'exact'.
>>>>
>>>> But if you need to use also unicode output then you need both 
>>>> 'exact' and
>>>> 'replace'. This can be done although slightly non-trivial.
>>>>
>>>> 2 Martin Pool: I think UI class should provide both encoded and raw 
>>>> output
>>>> stream. I'm not sure is it already done, so excuse me if this is 
>>>> already
>>>> exists.
>>> OK, IIUC this is now done via
>>> UIFactory.make_output_stream(encoding_type='exact')
>>>
>>
>> Thanks for the pointer Alexander.
>>
>> I am assuming I can override encoding_type in my derived class
>> cmd_grep the same way as I override takes_args, takes_options etc.
>> For some reason thats not working. I get the same error. I tried both,
>> 'exact' and 'replace'. I also tried overriding _setup_outf and hardcoding
>> 'exact' but that didn't work either.
> 
> I don't understand what it means "thats not working".
> 
> See bzrlib/builtins.py class cmd_cat and you'll see there:
> 
>     encoding_type = 'exact'
> 
> So it's *actually* works. Can you show the traceback with 'exact'?
> 
> In your case according to the traceback error occurs in the line:
> 
>    File "/home/parthm/.bazaar/plugins/grep/grep.py", line 240, in 
> _file_grep
>      outf.write(pfmt % (line,))
> 
> What is pfmt there?

I've looked at your code.

fmt = path + ":%s" + eol_marker

path is unicode string there, right? So fmt is also unicode string and 
therefore you get that error. This is wrong: using unicode fmt to put 
there non-unicode string.

So the problem is much harder: you either should encode path to terminal 
encoding, or decode file line to unicode. I'd say you have a problem.

> 
>> At the moment I just decode the lines before outf.write. i.e.
>>
>>     line = line.decode(_user_encoding, 'replace')
>>     outf.write(pfmt % (line,))
>>
>> That seems to be working ok with no errors, but I am not sure if its
>> the right way to use outf.
> 
> I think this is wrong way. Especially because on Windows if you print 
> text to the terminal you should use terminal_encoding, not user_encoding.
> 
> 
>