[RFC] more encodings tests [was: bzr handles unicode]

Sat Jan 7 13:55:31 GMT 2006

Alexander Belchenko wrote:
> One more test: I was write this one before I pull your latest changes.
> This test fails in main bzr.dev but successfully pass in your encoding
> branch r1527. If you think that this test is worth to include to your
> branch then please review this one.
> 
> -- 
> Alexander
> 

I most likely will pull it into my branch. I think more tests are
generally useful.

By the way, can you give me some unicode characters in your language?
Right now all I have is the swedish from Erik, and some arabic. I would
like to test a couple more characters.

> 
> ------------------------------------------------------------------------
> 
> === added file 'bzrlib/tests/blackbox/test_status.py'
> --- /dev/null	
> +++ bzrlib/tests/blackbox/test_status.py	
> @@ -0,0 +1,89 @@
> +# Copyright (C) 2005 by Canonical Ltd
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> +
> +
> +"""\
> +Black-box tests for encoding of bzr status.
> +
> +Status command usually prints their output (possible unicode) to sys.stdout.
> +When status output redirected to file or to pipe or encoding of sys.stdout
> +does not match needed encoding to show non-ascii filenames then status
> +fails because of UnicodeEncode error:
> +bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode characters: ordinal not in range(128)
> +
> +In case when sys.stdout.encoding is None or ascii
> +bzr should use bzrlib.user_encoding for print output.
> +

I'm not sure if bzr should use user_encoding if sys.stdout is 'ascii'.
Definitely if it is None.
What do other people think?

> +In case when sys.stdout.encoding doesn't match of filename encoding
> +bzr should use `replace` error handling scheme for unicode.encode() method
> +"""
> +
> +from cStringIO import StringIO
> +import os
> +import sys
> +
> +import bzrlib
> +from bzrlib.branch import Branch
> +from bzrlib.tests import TestCaseInTempDir, TestSkipped
> +from bzrlib.trace import mutter
> +
> +
> +class TestStatusEncodings(TestCaseInTempDir):
> +    
> +    def setUp(self):
> +        TestCaseInTempDir.setUp(self)
> +        self.user_encoding = bzrlib.user_encoding
> +        self.stdout = sys.stdout
> +
> +    def tearDown(self):
> +        bzrlib.user_encoding = self.user_encoding
> +        sys.stdout = self.stdout
> +        TestCaseInTempDir.tearDown(self)
> +
> +    def make_uncommitted_tree(self):
> +        """Build a branch with uncommitted unicode named changes in the cwd."""
> +        b = Branch.initialize(u'.')
> +        working_tree = b.working_tree()
> +        filename = u'hell\u00d8'
> +        try:
> +            self.build_tree_contents([(filename, 'contents of hello')])
> +        except UnicodeEncodeError:
> +            raise TestSkipped("can't build unicode working tree in "
> +                "filesystem encoding %s" % sys.getfilesystemencoding())
> +        working_tree.add(filename)
> +        return working_tree
> +
> +    def test_stdout_ascii(self):
> +        sys.stdout = StringIO()
> +        bzrlib.user_encoding = 'ascii'
> +        working_tree = self.make_uncommitted_tree()
> +        stdout, stderr = self.run_bzr_captured(["--no-plugins", "status"])
> +
> +        self.assertEquals(stdout, """\
> +added:
> +  hell?
> +""")
> +
> +    def test_stdout_latin1(self):
> +        sys.stdout = StringIO()
> +        bzrlib.user_encoding = 'latin-1'
> +        working_tree = self.make_uncommitted_tree()
> +        stdout, stderr = self.run_bzr_captured(["--no-plugins", "status"])
> +
> +        self.assertEquals(stdout, u"""\
> +added:
> +  hell\u00d8
> +""".encode('latin-1'))
> 
> === modified file 'bzrlib/tests/blackbox/__init__.py'
> --- bzrlib/tests/blackbox/__init__.py	
> +++ bzrlib/tests/blackbox/__init__.py	
> @@ -39,6 +39,7 @@
>                       'bzrlib.tests.blackbox.test_revision_info',
>                       'bzrlib.tests.blackbox.test_too_much',
>                       'bzrlib.tests.blackbox.test_versioning',
> +                     'bzrlib.tests.blackbox.test_status',
>                       ]
>      suite = TestSuite()
>      loader = TestLoader()
> 

One small comment at the end. We are trying to keep the test names
sorted alphabetically. It makes it easy to check if a test is missing,
and means that there will be fewer conflicts of people adding tests to
the same location. (At least in my head, I haven't heard an official
word on this from Martin).

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060107/5a21f399/attachment.pgp