[MERGE] Readable and properly encoded diff headers

Fri Aug 18 15:39:54 BST 2006

Alexander Belchenko wrote:
> I don't read all thread so excuse me if I say something already discussed.
> 
> John Arbash Meinel пишет:
>> And we can't just use 'user_encoding' because that is not always
>> terminal encoding. I'm sorry this is so complicated, but that is the way
>> encoding is. Yell at Microsoft for having a different
>> locale.getpreferredencoding() and terminal encoding. I believe the first
>> is the preferred encoding for file contents, not what will be printed
>> out on the screen. So we still need 'preferredencoding' for some things,
>> like make_commit_message(), since that should be in preferredencoding.
> 
> I think for Windows bzr *should* show filenames in diff header with
> *user_encoding*, because:
> 
> 1) 99% of windows users use this encoding for edit their text files
> 2) this encoding used for ANSI representation of filenames in windows;
> i.e. when you execute os.listdir('.') you get non-ascii filenames in
> ANSI encoding not in OEM
> 3) because of terminal encoding is OEM and text content is ANSI then on
> windows terminal I all the time have completely unreadable diff for
> russian texts
> 4) GNU diff (v.2.8.7) use ANSI encoding to show filenames in their
> output. And it looks as hieroglyphs on terminal:
> 
> E:\Bazaar\__test\diff>diff -u Тест Тест2
> --- ╥хёЄ        2006-08-18 09:38:24.252753600 +0300
> +++ ╥хёЄ2       2006-08-18 09:38:31.423064000 +0300
> @@ -1 +1 @@
> -content of ╥хёЄ
> +content of ╥хёЄ2
> 
> So, I think one precedent already exists and bzr also need to work in
> the same manner.

Well, from the above it would seem that 'diff' is getting it wrong.
Since you *aren't* able to read the headers from this.

I guess it is arguable either way.

> 
>> And probably if you redirect the output to a file, that should also be
>> in user_encoding, rather than terminal encoding. But Alexander is our #1
>> windows user, and for him it was better to still use terminal encoding.
> 
> No. I think it's wrong assumption. See my arguments above.

Well, you were the one that pushed for using 'sys.stderr' encoding if
'sys.stdout' was unavailable.

It makes sense to me to use user_encoding when sys.stdout.encoding is
not available, because that means the content is being redirected. And
thus we want to use the encoding for files, not the encoding for terminals.

And this would extend to 'bzr diff' as well. It should use terminal
encoding when available, but fall back to content encoding if redirected.

> 
>> Maybe just because he wanted to be able to do:
>>
>> bzr diff > file; cat file
> 
> win32 use 'type' instead of 'cat'.
> 
> Typically I use
> 
> bzr diff | less
> 
> in terminal, but for russian texts I need recode output with iconv:
> 
> bzr diff | iconv -fcp1251 -tcp866 | less
> 
> So I prefer to run diff from my favorite FTE editor, that show me diff
> in right encoding (i.e. cp1251). And all GUI front-end for bzr on
> windows also will use ANSI encoding.

Actually, GUI front ends are more likely to use Unicode encoding.
Windows has a usable Unicode api, and we shouldn't limit ourselves to
what a terminal can display. Otherwise I'd never be able to read
'جوجو.txt' in Thunderbird, or in any other app.

> 
> -- 
> Alexander

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060818/a6936789/attachment.pgp