bzr 0.8: stil not compatible with non-english filenames and log messages?

Tue May 2 13:48:22 BST 2006

On 22 Apr 2006, Alexander Belchenko <bialix at ukr.net> wrote:
> Martin Pool пишет:
> >
> >On 22/04/2006, at 8:14 AM, Alexander Belchenko wrote:
> >
> >>Erik Bågfors пишет:
> >>>I'm not having that problem in a UTF-8 environment.  Here is an output
> >>>from a log I just tried, with both swedish and esperanto :)
> >>
> >>See my original bug report:
> >>https://launchpad.net/products/bzr/+bug/5041
> >>
> >>And others Unicode-related bugs:
> >>
> >>https://launchpad.net/products/bzr/+bug/3823
> >>https://launchpad.net/products/bzr/+bug/3980
> >
> >I'd like to see them fixed for 0.8; if you can determine what patch 
> >fixes them I promise I will review it  and put it in if it looks safe.
> 
> My problem with log and commit message solve attached patch (I use this 
> patch with bzr 0.7 and feel almost happy). But this patch fix only 
> smallest part of problem. There is left problem with diff on non-ascii 
> filename and with mentioned problem in mutter. And more and more. John's 
> work was more solid and I really like to hear his comments.

This looks like a reasonable change to me.  If this is a useful step let's 
put it in for 0.8 and try to get some more unicode improvements after
that.

> === modified file 'a/bzrlib/builtins.py'
> --- a/bzrlib/builtins.py	
> +++ b/bzrlib/builtins.py	
> @@ -1200,11 +1200,15 @@
>          if rev1 > rev2:
>              (rev2, rev1) = (rev1, rev2)
>  
> -        mutter('encoding log as %r', bzrlib.user_encoding)
> +        if hasattr(sys.stdout, "encoding"):
> +            output_encoding = sys.stdout.encoding or bzrlib.user_encoding
> +        else:
> +            output_encoding = bzrlib.user_encoding
> +        mutter('encoding log as %r', output_encoding)
>  
>          # use 'replace' so that we don't abort if trying to write out
>          # in e.g. the default C locale.
> -        outf = codecs.getwriter(bzrlib.user_encoding)(sys.stdout, errors='replace')
> +        outf = codecs.getwriter(output_encoding)(sys.stdout, errors='replace')
>  
>          if (log_format == None):
>              default = bzrlib.config.BranchConfig(b).log_format()
> 
> === modified file 'a/bzrlib/msgeditor.py'
> --- a/bzrlib/msgeditor.py	
> +++ b/bzrlib/msgeditor.py	
> @@ -19,10 +19,12 @@
>  
>  """Commit message editor support."""
>  
> +import codecs
>  import os
>  import errno
>  from subprocess import call
>  
> +import bzrlib
>  import bzrlib.config as config
>  from bzrlib.errors import BzrError
>  
> @@ -92,7 +94,8 @@
>          if infotext is not None and infotext != "":
>              hasinfo = True
>              msgfile = file(msgfilename, "w")
> -            msgfile.write("\n\n%s\n\n%s" % (ignoreline, infotext))
> +            msgfile.write("\n\n%s\n\n%s" % (ignoreline,
> +                        infotext.encode(bzrlib.user_encoding, "replace")))
>              msgfile.close()
>          else:
>              hasinfo = False
> @@ -103,7 +106,7 @@
>          started = False
>          msg = []
>          lastline, nlines = 0, 0
> -        for line in file(msgfilename, "r"):
> +        for line in codecs.open(msgfilename, 'rt', bzrlib.user_encoding):
>              stripped_line = line.strip()
>              # strip empty line before the log message starts
>              if not started:
> 
> === modified file 'a/bzrlib/status.py'
> --- a/bzrlib/status.py	
> +++ b/bzrlib/status.py	
> @@ -14,8 +14,10 @@
>  # along with this program; if not, write to the Free Software
>  # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>  
> +import codecs
>  import sys
>  
> +import bzrlib
>  from bzrlib.delta import compare_trees
>  from bzrlib.diff import _raise_if_nonexistent
>  from bzrlib.errors import NoSuchRevision
> @@ -110,7 +112,9 @@
>          If two revisions show status between first and second.
>      """
>      if to_file == None:
> -        to_file = sys.stdout
> +        output_encoding = getattr(sys.stdout, "encoding", None) \
> +                          or bzrlib.user_encoding
> +        to_file = codecs.getwriter(output_encoding)(sys.stdout, "replace")
>      
>      wt.lock_read()
>      try:
> 

-- 
Martin