[RFC] [Bug 382697] diff headers should contain non-ascii filenames in user_encoding, not in utf-8

Alexander Belchenko bialix at ukr.net
Tue Jun 2 12:22:42 BST 2009


I've started to chat with jam on the subj topic in IRC before UDS, and 
want to solve this problem. Guidance needed.  \

Attachments contain log of some console scenarios for different 
commands. In all cases I've used filenames "Тест.txt" this is "Test.txt" 
in Russian.

I think all commands except `bzr send` should encode non-ascii filenames 
in user_encoding not in utf-8. Maybe `bzr diff` should have additional 
command-line options `--diff-header-encoding` to force different 
encoding of filenames if needed.

Fortunately some support for this is already present in the bzr core 
(written by John Meinel in 2006-2007).

So my question is: how to better solve this problem and how to write 
appropriate tests there?


-------- Исходное сообщение --------
Тема: [Bug 382697] [NEW] diff headers should contain non-ascii 
filenames in	user_encoding, not in utf-8
Дата: Tue, 02 Jun 2009 11:09:19 -0000
От: Alexander Belchenko <bialix at ukr.net>
Отвечать: Bug 382697 <382697 at bugs.launchpad.net>
Кому: bialix at ukr.net
Ссылки: <20090602110919.31004.99197.malonedeb at gangotri.canonical.com>

Public bug reported:

Currently bzr can produce diff as result of 5 different operations:

bzr diff
bzr commit --show-diff
bzr log -p
bzr merge --preview
bzr send

In most of these commands non-ascii filename always shown as utf-8 
string. This is bad for windows users, because their locale never utf-8 
(at least by default), and not always it's possible to switch console 
locale to utf-8 (chcp 65001 won't work on all Windows versions).
Furthermore, bzr itself does not understand cp65001 codepag because 
Python does not recognize it as utf-8, see http://bugs.python.org/issue6058.

I think all comands except `send` should always print filenames in
user_encoding, or AT LEAST show them in user_encoding in the === line
(as bzr ci --show-diff currently does). Because all these commands are
intended  to produce output for humans.

Also, I should note that GNU diff (from http://gnuwin32.sf.net) always
print filenames in user_encoding.

See attached files with output of various commands.

Will be nice to fix this before 2.0. Some guidance needed from core
devs, especially about writing tests for this changes.

** Affects: bzr
      Importance: Undecided
          Status: New
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: commit.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090602/66b4d058/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cp65001.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090602/66b4d058/attachment-0001.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: diff.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090602/66b4d058/attachment-0002.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-v-p.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090602/66b4d058/attachment-0003.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: merge-preview.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090602/66b4d058/attachment-0004.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: send.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090602/66b4d058/attachment-0005.txt 


More information about the bazaar mailing list