[MERGE] Readable and properly encoded diff headers

Adeodato Simó dato at net.com.org.es
Thu Aug 24 18:46:24 BST 2006


* Alexander Belchenko [Fri, 18 Aug 2006 10:03:46 +0300]:

Hey, (Rob, question for which you got CC'ed at the very end)

> John Arbash Meinel пишет:
> >And we can't just use 'user_encoding' because that is not always
> >terminal encoding. I'm sorry this is so complicated, but that is the way
> >encoding is. Yell at Microsoft for having a different
> >locale.getpreferredencoding() and terminal encoding. I believe the first
> >is the preferred encoding for file contents, not what will be printed
> >out on the screen. So we still need 'preferredencoding' for some things,
> >like make_commit_message(), since that should be in preferredencoding.

> I think for Windows bzr *should* show filenames in diff header with
> *user_encoding*, because:

> 1) 99% of windows users use this encoding for edit their text files
> 2) this encoding used for ANSI representation of filenames in windows;
> i.e. when you execute os.listdir('.') you get non-ascii filenames in
> ANSI encoding not in OEM
> 3) because of terminal encoding is OEM and text content is ANSI then on
> windows terminal I all the time have completely unreadable diff for
> russian texts
> 4) GNU diff (v.2.8.7) use ANSI encoding to show filenames in their
> output.

Alexander, John, these are my conclusions after giving a careful read to
this subthread. I'm don't have easy access to any Windows system,
though, so I haven't really been able to test osutils.get_terminal_encoding()
behavior.

  (1) using the terminal encoding to translate the file names is fine
      for the simple case of running `bzr diff` in the terminal (that
      is, when there is _no_ redirection).

  (2) when there is redirection to a file, user_encoding should be used,
      period.

  (3) the case of redirection to a pipe is trickier, since you can do
      both `bzr diff | less` and `bzr diff | type >file`. Because of
      this, I think that using user_encoding is better in this case.

John, by my reading of osutils.get_terminal_encoding(), it seems it'd
cover (1) just fine, but I'm not sure what happens for (2) and (3).
Seems like when redirecting to a file, sys.stdout.encoding will be None,
right?, so sys.stdin.encoding will be tried, which will be... the
terminal encoding? (I'll leave out (3) for now, until we can get (2)
cleared up.)

I really don't understand why get_terminal_encoding() looks at stdin at
all, since it's always outputting what we're talking about.

So, as I understand it, one of the following should happen to make my
patch correct:

  * somebody shows that get_terminal_encoding() in Windows does return
    the ANSI encoding when there is redirection involved (either to a
    file, or to a pipe)

  * if not, get_terminal_encoding() gets changed not to look at stdin
    (and whenever necessary, get_terminal_input_encoding() can be
    created)

  * if not, get_terminal_output_encoding() is created, that does not
    look at stdin.encoding

  * if not, I revert from get_terminal_input_encoding() to user_encoding
    in my fix_diff_header_encoding branch.

BTW, anything else left to address about this branch before somebody
will merge it?

Rob, is this suitable for 0.10?

Cheers,

-- 
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
                               Listening to: Albert Plà - Nuestro jardín





More information about the bazaar mailing list