[MERGE] Readable and properly encoded diff headers

Adeodato Simó dato at net.com.org.es
Thu Aug 24 18:46:24 BST 2006

* Alexander Belchenko [Fri, 18 Aug 2006 10:03:46 +0300]:

Hey, (Rob, question for which you got CC'ed at the very end)

> John Arbash Meinel пишет:
> >And we can't just use 'user_encoding' because that is not always
> >terminal encoding. I'm sorry this is so complicated, but that is the way
> >encoding is. Yell at Microsoft for having a different
> >locale.getpreferredencoding() and terminal encoding. I believe the first
> >is the preferred encoding for file contents, not what will be printed
> >out on the screen. So we still need 'preferredencoding' for some things,
> >like make_commit_message(), since that should be in preferredencoding.

> I think for Windows bzr *should* show filenames in diff header with
> *user_encoding*, because:

> 1) 99% of windows users use this encoding for edit their text files
> 2) this encoding used for ANSI representation of filenames in windows;
> i.e. when you execute os.listdir('.') you get non-ascii filenames in
> ANSI encoding not in OEM
> 3) because of terminal encoding is OEM and text content is ANSI then on
> windows terminal I all the time have completely unreadable diff for
> russian texts
> 4) GNU diff (v.2.8.7) use ANSI encoding to show filenames in their
> output.

Alexander, John, these are my conclusions after giving a careful read to
this subthread. I'm don't have easy access to any Windows system,
though, so I haven't really been able to test osutils.get_terminal_encoding()

  (1) using the terminal encoding to translate the file names is fine
      for the simple case of running `bzr diff` in the terminal (that
      is, when there is _no_ redirection).

  (2) when there is redirection to a file, user_encoding should be used,

  (3) the case of redirection to a pipe is trickier, since you can do
      both `bzr diff | less` and `bzr diff | type >file`. Because of
      this, I think that using user_encoding is better in this case.

John, by my reading of osutils.get_terminal_encoding(), it seems it'd
cover (1) just fine, but I'm not sure what happens for (2) and (3).
Seems like when redirecting to a file, sys.stdout.encoding will be None,
right?, so sys.stdin.encoding will be tried, which will be... the
terminal encoding? (I'll leave out (3) for now, until we can get (2)
cleared up.)

I really don't understand why get_terminal_encoding() looks at stdin at
all, since it's always outputting what we're talking about.

So, as I understand it, one of the following should happen to make my
patch correct:

  * somebody shows that get_terminal_encoding() in Windows does return
    the ANSI encoding when there is redirection involved (either to a
    file, or to a pipe)

  * if not, get_terminal_encoding() gets changed not to look at stdin
    (and whenever necessary, get_terminal_input_encoding() can be

  * if not, get_terminal_output_encoding() is created, that does not
    look at stdin.encoding

  * if not, I revert from get_terminal_input_encoding() to user_encoding
    in my fix_diff_header_encoding branch.

BTW, anything else left to address about this branch before somebody
will merge it?

Rob, is this suitable for 0.10?


Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
                               Listening to: Albert Plà - Nuestro jardín

More information about the bazaar mailing list