[MERGE] UTF-8 encoding in binary diffs

John Arbash Meinel john at arbash-meinel.com
Thu Jul 12 16:29:00 BST 2007

Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:
>     robert> On Thu, 2007-07-12 at 16:50 +1000, Martin Pool wrote:
>     >> 
>     >> In other cases this might have value though - for example when we want
>     >> to allow for platforms with normalization.  But even then it's
>     >> probably better handled by having tests that don't need to exercise
>     >> normalization use names that won't be affected. 
>     robert> I think what I'm getting at is that we can cheaply increase our test
>     robert> coverage of non-ascii names by making all tests use
>     robert> normalisation-requiring names whenever possible.
> If by that you imply that the tests will fail on HFS+ filesystems
> mounted via nfs, I think I will be strongly -1 on such an idea.
>         Vincent

My understanding was that Robert would have us try a few names until we got one
we knew would be represented correctly.

Even Linux can have its encoding set to iso-8859-1 so some names will not be
representable there.

Aaron had an interesting point about using something like os.stat(u'\1234') to
see if it could be used. However, that still throws a OSError since the file
doesn't exist.

I'm fairly confident that Python is just going through
'sys.getfilesystemencoding()', so we can just grab that, and try a few
path.encode(fs_enc). Note we should actually use osutils._fs_enc (only in a
saner manner than accessing a private var), since it handles when sys.get...()
returns None.

I have mixed feelings overall, though.

I like having more unicode testing. And changing most tests to use Unicode
names does stress more code overall.

I'm not sure how it falls into "each test should test 1 and only 1 thing, so
that failures are clear."

Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the bazaar mailing list