[MERGE] UTF-8 encoding in binary diffs

John Arbash Meinel john at arbash-meinel.com
Fri Jul 6 20:22:57 BST 2007


John Arbash Meinel has voted +0.
Status is now: Waiting
Comment:
The contents of the files should *not* be UTF-8 encoded. The *filenames* 
should be UTF-8 encoded. I think it is just a doc-string bug.

As near as I can tell, you test filenames are 'binary' and 'elephant'. 
Neither of which are non-ascii. So I don't see why the test would ever 
fail.

So you should be using Unicode names like:
'omega':u'\u03a9', 'alpha':u'\u03b1'

(I would avoid things like u'\xe5' (a with ring) because of problems 
with combining characters on Mac).

+        self.assertContainsRe(diff, "\\+\\+\\+ b/elephant")
^- This seems a lot clearer to me as:
+        self.assertContainsRe(diff, r"\+\+\+ b/elephant")

(In fact, I would always use r'' for assertContainsRe unless you really 
need to escape something in the string, even if '\' never shows up 
there).

So, I'm happy with the change, but I honestly don't see how the test is 
testing what you think it is.


For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3Cd06a5cd30707060557u7b55fbe7s1b3bb839d45185f9%40mail.gmail.com%3E



More information about the bazaar mailing list