[MERGE] Readable and properly encoded diff headers
Adeodato Simó
dato at net.com.org.es
Thu Aug 24 18:46:24 BST 2006
* Alexander Belchenko [Fri, 18 Aug 2006 10:03:46 +0300]:
Hey, (Rob, question for which you got CC'ed at the very end)
> John Arbash Meinel пишет:
> >And we can't just use 'user_encoding' because that is not always
> >terminal encoding. I'm sorry this is so complicated, but that is the way
> >encoding is. Yell at Microsoft for having a different
> >locale.getpreferredencoding() and terminal encoding. I believe the first
> >is the preferred encoding for file contents, not what will be printed
> >out on the screen. So we still need 'preferredencoding' for some things,
> >like make_commit_message(), since that should be in preferredencoding.
> I think for Windows bzr *should* show filenames in diff header with
> *user_encoding*, because:
> 1) 99% of windows users use this encoding for edit their text files
> 2) this encoding used for ANSI representation of filenames in windows;
> i.e. when you execute os.listdir('.') you get non-ascii filenames in
> ANSI encoding not in OEM
> 3) because of terminal encoding is OEM and text content is ANSI then on
> windows terminal I all the time have completely unreadable diff for
> russian texts
> 4) GNU diff (v.2.8.7) use ANSI encoding to show filenames in their
> output.
Alexander, John, these are my conclusions after giving a careful read to
this subthread. I'm don't have easy access to any Windows system,
though, so I haven't really been able to test osutils.get_terminal_encoding()
behavior.
(1) using the terminal encoding to translate the file names is fine
for the simple case of running `bzr diff` in the terminal (that
is, when there is _no_ redirection).
(2) when there is redirection to a file, user_encoding should be used,
period.
(3) the case of redirection to a pipe is trickier, since you can do
both `bzr diff | less` and `bzr diff | type >file`. Because of
this, I think that using user_encoding is better in this case.
John, by my reading of osutils.get_terminal_encoding(), it seems it'd
cover (1) just fine, but I'm not sure what happens for (2) and (3).
Seems like when redirecting to a file, sys.stdout.encoding will be None,
right?, so sys.stdin.encoding will be tried, which will be... the
terminal encoding? (I'll leave out (3) for now, until we can get (2)
cleared up.)
I really don't understand why get_terminal_encoding() looks at stdin at
all, since it's always outputting what we're talking about.
So, as I understand it, one of the following should happen to make my
patch correct:
* somebody shows that get_terminal_encoding() in Windows does return
the ANSI encoding when there is redirection involved (either to a
file, or to a pipe)
* if not, get_terminal_encoding() gets changed not to look at stdin
(and whenever necessary, get_terminal_input_encoding() can be
created)
* if not, get_terminal_output_encoding() is created, that does not
look at stdin.encoding
* if not, I revert from get_terminal_input_encoding() to user_encoding
in my fix_diff_header_encoding branch.
BTW, anything else left to address about this branch before somebody
will merge it?
Rob, is this suitable for 0.10?
Cheers,
--
Adeodato Simó dato at net.com.org.es
Debian Developer adeodato at debian.org
Listening to: Albert Plà - Nuestro jardín
More information about the bazaar
mailing list