[MERGE] Readable and properly encoded diff headers

Thu Aug 24 20:49:51 BST 2006

Adeodato Simó wrote:
> * Alexander Belchenko [Fri, 18 Aug 2006 10:03:46 +0300]:
> 
> Hey, (Rob, question for which you got CC'ed at the very end)
> 
>> John Arbash Meinel пишет:
>>> And we can't just use 'user_encoding' because that is not always
>>> terminal encoding. I'm sorry this is so complicated, but that is the way
>>> encoding is. Yell at Microsoft for having a different
>>> locale.getpreferredencoding() and terminal encoding. I believe the first
>>> is the preferred encoding for file contents, not what will be printed
>>> out on the screen. So we still need 'preferredencoding' for some things,
>>> like make_commit_message(), since that should be in preferredencoding.
> 
>> I think for Windows bzr *should* show filenames in diff header with
>> *user_encoding*, because:
> 
>> 1) 99% of windows users use this encoding for edit their text files
>> 2) this encoding used for ANSI representation of filenames in windows;
>> i.e. when you execute os.listdir('.') you get non-ascii filenames in
>> ANSI encoding not in OEM
>> 3) because of terminal encoding is OEM and text content is ANSI then on
>> windows terminal I all the time have completely unreadable diff for
>> russian texts
>> 4) GNU diff (v.2.8.7) use ANSI encoding to show filenames in their
>> output.
> 
> Alexander, John, these are my conclusions after giving a careful read to
> this subthread. I'm don't have easy access to any Windows system,
> though, so I haven't really been able to test osutils.get_terminal_encoding()
> behavior.
> 
>   (1) using the terminal encoding to translate the file names is fine
>       for the simple case of running `bzr diff` in the terminal (that
>       is, when there is _no_ redirection).
> 
>   (2) when there is redirection to a file, user_encoding should be used,
>       period.
> 
>   (3) the case of redirection to a pipe is trickier, since you can do
>       both `bzr diff | less` and `bzr diff | type >file`. Because of
>       this, I think that using user_encoding is better in this case.

Well, the bigger problem is that there is no way to tell the different
between 'bzr diff | foo' and 'bzr diff > foo'. At least from bzr's
perspective.

> 
> John, by my reading of osutils.get_terminal_encoding(), it seems it'd
> cover (1) just fine, but I'm not sure what happens for (2) and (3).
> Seems like when redirecting to a file, sys.stdout.encoding will be None,
> right?, so sys.stdin.encoding will be tried, which will be... the
> terminal encoding? (I'll leave out (3) for now, until we can get (2)
> cleared up.)
> 
> I really don't understand why get_terminal_encoding() looks at stdin at
> all, since it's always outputting what we're talking about.

Because 'bzr log | less' would like to use terminal encoding. And can't
tell the difference versus 'bzr log > foo.txt'.

We could parameterize 'terminal_encoding()' so that commands that are
user info (log) versus file content (diff) would optionally look at stdin.

It's all just a big hairy mess on Windows. Not to mention that 'bzr diff
> foo.patch' will generate an invalid patch under windows, because
stdout defaults to being in text mode, so if you have anything but all
CRLF files, the diff will be incorrect (it will have either all CRLF, or
possibly a mix of CRLF and CRCRLF files).

At one point I proposed using binary always (and wrote a plugin to do
it), but Alexander didn't like it, because 'bzr log > foo.txt' would
create a LF file, and the default text reader on Windows is Notepad,
which doesn't handle LF files.

We bike-shedded a lot for slash support, until I finally just forced the
code to use all forward slashes, and said we would try to translate at a
higher level later.

I really feel that we should stop messing around with terminal encoding
versus filesystem encoding, etc etc. Get something that works >50% of
the time (hopefully 90%). And then instead spend our effort developing a
GUI for bzr (Olive has some potential here). So that nobody has to deal
with the pain that is the Windows command line.

At that point we can switch so that 'bzr command' uses terminal
encoding, and 'bzr command |' or 'bzr command >' uses user_encoding. And
all of them use binary mode for output. Trying to be magic is just
messing us up. Better to solve it correctly (with a GUI) rather than
hack after hack.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060824/50693d1f/attachment.pgp