[MERGE] UTF-8 encoding in binary diffs

Thu Jul 19 03:29:36 BST 2007

On 7/17/07, Jonathan Lange <jml at mumak.net> wrote:
> > UPDATE: Not mergeable in its current form because it breaks when the
> > test case can't create unicode filenames - jml, could you please fix
> > that as described on the list?
>
> Fixed.

Thanks!

> I added the Feature to test_diff, as it seems that most Features are
> defined in the tests modules in which they are used. To me, it seems
> better to have a central bzrlib.tests.features module.

That would be good, could you do please?  Either a followon or an
update is fine.

Since this caused some discussion and since the utf-8 encoding is only
a stopgap, could you please add a comment explaining why it's utf-8,
why the body is not encoded, and why it should really be a format
specified by the caller?  Just summarize this thread.

+class _UnicodeFilename(Feature):
+    """Does the filesystem support Unicode filenames?"""
+
+    def _probe(self):
+        filename = u'\u03b1'
+        os.mkdir('tree')
+        try:
+            fd = file('tree/' + filename, 'wb')
+        except UnicodeEncodeError:
+            return False
+        fd.close()
+        os.remove('tree/' + filename)
+        os.rmdir('tree')
+        return True
+
+UnicodeFilename = _UnicodeFilename()

I'm concerned that this might be called when not in a temp dir and
would pollute the cwd.  (Code that's not in a temp dir test case
shouldn't care about this feature but I wouldn't bet on it.)

Is it really necessary to actually write a file?  Why not try to read
one that doesn't exist - if you get an ioerror or oserror rather than
unicodeencodeerror then you're probably ok.

If you were going to keep this code the cleanup should be in a finally
block.  close should _always_ be in a finally block, that's why it's
called close. :-)   This leaks directories on non-unicode systems!
And it will always fail the second time through because 'tree' will
exist.

I think this indicates that you should have tests for the feature. :-)
 Maybe at least make sure that the probe method passes, and (maybe?)
fails if you monkeypatch getfsencoding to ascii.

-- 
Martin