[MERGE] Readable and properly encoded diff headers
Alexander Belchenko
bialix at ukr.net
Sat Aug 19 07:13:24 BST 2006
John Arbash Meinel пишет:
> Alexander Belchenko wrote:
>> I don't read all thread so excuse me if I say something already discussed.
>>
>> John Arbash Meinel пишет:
>> 4) GNU diff (v.2.8.7) use ANSI encoding to show filenames in their
>> output. And it looks as hieroglyphs on terminal:
>>
>> E:\Bazaar\__test\diff>diff -u Тест Тест2
>> --- ╥хёЄ 2006-08-18 09:38:24.252753600 +0300
>> +++ ╥хёЄ2 2006-08-18 09:38:31.423064000 +0300
>> @@ -1 +1 @@
>> -content of ╥хёЄ
>> +content of ╥хёЄ2
>>
>> So, I think one precedent already exists and bzr also need to work in
>> the same manner.
>
> Well, from the above it would seem that 'diff' is getting it wrong.
> Since you *aren't* able to read the headers from this.
>
> I guess it is arguable either way.
Probably.
But for me the command
bzr diff > outfile
means that I want store diff for sending it via e-mail. (BTW, I will
happy with '--output' flag here that Matthieu Moy was implements for diff).
But it's not the same as
bzr log | less
It's reading human-readable log in current terminal encoding.
But for bzr both variants is the same -- it's redirecting output to
pipe. How you propose differentiate them? Probably only by implementing
different behaviour on redirecting output for different commands.
>>> And probably if you redirect the output to a file, that should also be
>>> in user_encoding, rather than terminal encoding. But Alexander is our #1
>>> windows user, and for him it was better to still use terminal encoding.
>> No. I think it's wrong assumption. See my arguments above.
>
> Well, you were the one that pushed for using 'sys.stderr' encoding if
> 'sys.stdout' was unavailable.
Yes. I propose this because I cannot do in terminal
bzr log | less
without decoding log in terminal encoding for reading logs through
'less' pager (or 'more' pager).
If you think that I was wrong here I can withdraw my proposal to using
sys.stderr.encoding when sys.stdout.encoding is not available.
I think I can and have to explicitly switch my console to use *always*
ANSI (cp1251) and then for me user_encoding == terminal_encoding. It's
doable on windows, simply require extra efforts. But by default this
option turned off and it's will be each novice question 'why'.
> It makes sense to me to use user_encoding when sys.stdout.encoding is
> not available, because that means the content is being redirected. And
> thus we want to use the encoding for files, not the encoding for terminals.
This does not work for 'bzr log | less' as I mention above.
> And this would extend to 'bzr diff' as well. It should use terminal
> encoding when available, but fall back to content encoding if redirected.
>
>>> Maybe just because he wanted to be able to do:
>>>
>>> bzr diff > file; cat file
>> win32 use 'type' instead of 'cat'.
>>
>> Typically I use
>>
>> bzr diff | less
>>
>> in terminal, but for russian texts I need recode output with iconv:
>>
>> bzr diff | iconv -fcp1251 -tcp866 | less
That's why I need to have exactly know what encoding of bzr output is.
I even can live with utf-8 encoding of filenames. Just need to know
exactly what encoding is.
Recently I start to write plugin for automatically decode output of diff
command to obtain readable output on different terminal. Per example, in
some situation I cannot use utf-8 for some russian texts and I each time
need to recode output, because file content is cp1251, windows console
is cp866, but linux console is koi8-r.
Yes, I know: that's why unicode was introduced. But sometimes I simply
for some reasons cannot use unicode but I need to obtain cross-platforming.
I prefer to introduce and implement in core the '--encoding' options for
all command that generate human-readable output.
>> So I prefer to run diff from my favorite FTE editor, that show me diff
>> in right encoding (i.e. cp1251). And all GUI front-end for bzr on
>> windows also will use ANSI encoding.
>
> Actually, GUI front ends are more likely to use Unicode encoding.
> Windows has a usable Unicode api, and we shouldn't limit ourselves to
> what a terminal can display. Otherwise I'd never be able to read
> 'جوجو.txt' in Thunderbird, or in any other app.
--
Alexander
More information about the bazaar
mailing list