[MERGE] Test for bug #272444 (symlinks to Unicode file names)

John Arbash Meinel john at arbash-meinel.com
Fri Oct 17 23:00:23 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Clemente wrote:
> 
>   I adapted the test to use unicode names instead of forcing utf-8, according to your suggestions.
>   Now the name is test_unicode_symlink instead of test_utf8_symlink.
> 
> Andrew Bennetts <andrew.bennetts at canonical.com> writes:
>> That should be "a utf-8", not "an utf-8"; the letter "u" (and the word
>> "unicode") actually start with a consonant sound.
> 
>   Buff! Vowels in English are hard.
>   I changed that too.
> 
>   I attach the new patch. Thanks,
> 
> Daniel
> 
> 

=== modified file 'bzrlib/tests/branch_implementations/test_sprout.py'
- --- bzrlib/tests/branch_implementations/test_sprout.py	2007-11-01
09:52:45 +0000
+++ bzrlib/tests/branch_implementations/test_sprout.py	2008-10-17
09:15:25 +0000
@@ -1,4 +1,5 @@
 # Copyright (C) 2007 Canonical Ltd
+# -*- coding: utf-8 -*-

^- I may be coming late to this, but we shouldn't need a line like this.
IIRC, we don't allow non-ascii characters in our source files (at least
as much as we can manage that.) Pretty much always we can get away with
just escaping the character, rather than needing it explicit. I suppose
you did write "'adiós'" in the text, though I'm wondering if we want to
have that in a comment.

Also, 'ó' is a combining character, so it is going to get us in trouble
on Mac, which will expand u'\xf3' to u'o\u0301'.

It is better to use something like Omega (Ω) u'\u03a9'
Since that doesn't have another form.

And just to mention *why* it isn't supported. Symlink targets are just
'non-null' 8-bit strings, with no defined encoding. I believe the true
encoding is just whatever the filesystem encoding is. Which means that
we probably need to decode the string when we read it from disk, track
it in memory as a Unicode string, store it into our inventories as
UTF-8, and then encode it when we go to create the symlink.

There certainly are still people who's filesystem claims to be
iso-8859-1 (latin-1), especially on older systems. (If you set LANG=en
rather than en.UTF-8 you get that sort of behavior.)

...

+            raise KnownFailure('there is no support for symlinks to
non-ASCII targets (bug 272444)')
+

^- This line is longer than 80 characters, just split it into something
like:
+            raise KnownFailure('there is no support for'
+		' symlinks to non-ASCII targets (bug #272444)')
+

(I *personally* prefer to see bug numbers with # in front of them, but
there is no specific style guidline for that. Though we are consistent
about it in the NEWS file.)


=== modified file 'bzrlib/tests/workingtree_implementations/test_parents.py'

^- I *do* think that we need a 'workingtree_implementations' test for
this, but I don't see why it is going in 'test_parents'. Could you
explain why you thought it should be there?

Especially considering that the tree is already *at* [revision] I don't
see why "set_parent_ids([revision])" would be anything other than a
no-op. (Which at a minimum would mean your test fails for the wrong reason.)


All this said, other people are free to override what I say, as I
haven't been particularly active in this thread.

John
=:_>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkj5CvcACgkQJdeBCYSNAANPNwCaA+tVkXNGiRvIDR4yg/HNYbGK
zjIAn2U+vBQKp0My0+1bDDn6adv/CpRs
=Z+uU
-----END PGP SIGNATURE-----



More information about the bazaar mailing list