[MERGE] Readable and properly encoded diff headers

Alexander Belchenko bialix at ukr.net
Sat Aug 19 07:13:24 BST 2006

John Arbash Meinel пишет:
> Alexander Belchenko wrote:
>> I don't read all thread so excuse me if I say something already discussed.
>> John Arbash Meinel пишет:
>> 4) GNU diff (v.2.8.7) use ANSI encoding to show filenames in their
>> output. And it looks as hieroglyphs on terminal:
>> E:\Bazaar\__test\diff>diff -u Тест Тест2
>> --- ╥хёЄ        2006-08-18 09:38:24.252753600 +0300
>> +++ ╥хёЄ2       2006-08-18 09:38:31.423064000 +0300
>> @@ -1 +1 @@
>> -content of ╥хёЄ
>> +content of ╥хёЄ2
>> So, I think one precedent already exists and bzr also need to work in
>> the same manner.
> Well, from the above it would seem that 'diff' is getting it wrong.
> Since you *aren't* able to read the headers from this.
> I guess it is arguable either way.


But for me the command

bzr diff > outfile

means that I want store diff for sending it via e-mail. (BTW, I will 
happy with '--output' flag here that Matthieu Moy was implements for diff).

But it's not the same as

bzr log | less

It's reading human-readable log in current terminal encoding.

But for bzr both variants is the same -- it's redirecting output to 
pipe. How you propose differentiate them? Probably only by implementing 
different behaviour on redirecting output for different commands.

>>> And probably if you redirect the output to a file, that should also be
>>> in user_encoding, rather than terminal encoding. But Alexander is our #1
>>> windows user, and for him it was better to still use terminal encoding.
>> No. I think it's wrong assumption. See my arguments above.
> Well, you were the one that pushed for using 'sys.stderr' encoding if
> 'sys.stdout' was unavailable.

Yes. I propose this because I cannot do in terminal

bzr log | less

without decoding log in terminal encoding for reading logs through 
'less' pager (or 'more' pager).

If you think that I was wrong here I can withdraw my proposal to using 
sys.stderr.encoding when sys.stdout.encoding is not available.

I think I can and have to explicitly switch my console to use *always* 
ANSI (cp1251) and then for me user_encoding == terminal_encoding. It's 
doable on windows, simply require extra efforts. But by default this 
option turned off and it's will be each novice question 'why'.

> It makes sense to me to use user_encoding when sys.stdout.encoding is
> not available, because that means the content is being redirected. And
> thus we want to use the encoding for files, not the encoding for terminals.

This does not work for 'bzr log | less' as I mention above.

> And this would extend to 'bzr diff' as well. It should use terminal
> encoding when available, but fall back to content encoding if redirected.
>>> Maybe just because he wanted to be able to do:
>>> bzr diff > file; cat file
>> win32 use 'type' instead of 'cat'.
>> Typically I use
>> bzr diff | less
>> in terminal, but for russian texts I need recode output with iconv:
>> bzr diff | iconv -fcp1251 -tcp866 | less

That's why I need to have exactly know what encoding of bzr output is.
I even can live with utf-8 encoding of filenames. Just need to know 
exactly what encoding is.

Recently I start to write plugin for automatically decode output of diff 
command to obtain readable output on different terminal. Per example, in 
some situation I cannot use utf-8 for some russian texts and I each time 
need to recode output, because file content is cp1251, windows console 
is cp866, but linux console is koi8-r.

Yes, I know: that's why unicode was introduced. But sometimes I simply 
for some reasons cannot use unicode but I need to obtain cross-platforming.

I prefer to introduce and implement in core the '--encoding' options for 
all command that generate human-readable output.

>> So I prefer to run diff from my favorite FTE editor, that show me diff
>> in right encoding (i.e. cp1251). And all GUI front-end for bzr on
>> windows also will use ANSI encoding.
> Actually, GUI front ends are more likely to use Unicode encoding.
> Windows has a usable Unicode api, and we shouldn't limit ourselves to
> what a terminal can display. Otherwise I'd never be able to read
> 'جوجو.txt' in Thunderbird, or in any other app.


More information about the bazaar mailing list