[MERGE] UTF-8 encoding in binary diffs
Martin Pool
mbp at sourcefrog.net
Thu Jul 19 03:29:36 BST 2007
On 7/17/07, Jonathan Lange <jml at mumak.net> wrote:
> > UPDATE: Not mergeable in its current form because it breaks when the
> > test case can't create unicode filenames - jml, could you please fix
> > that as described on the list?
>
> Fixed.
Thanks!
> I added the Feature to test_diff, as it seems that most Features are
> defined in the tests modules in which they are used. To me, it seems
> better to have a central bzrlib.tests.features module.
That would be good, could you do please? Either a followon or an
update is fine.
Since this caused some discussion and since the utf-8 encoding is only
a stopgap, could you please add a comment explaining why it's utf-8,
why the body is not encoded, and why it should really be a format
specified by the caller? Just summarize this thread.
+class _UnicodeFilename(Feature):
+ """Does the filesystem support Unicode filenames?"""
+
+ def _probe(self):
+ filename = u'\u03b1'
+ os.mkdir('tree')
+ try:
+ fd = file('tree/' + filename, 'wb')
+ except UnicodeEncodeError:
+ return False
+ fd.close()
+ os.remove('tree/' + filename)
+ os.rmdir('tree')
+ return True
+
+UnicodeFilename = _UnicodeFilename()
I'm concerned that this might be called when not in a temp dir and
would pollute the cwd. (Code that's not in a temp dir test case
shouldn't care about this feature but I wouldn't bet on it.)
Is it really necessary to actually write a file? Why not try to read
one that doesn't exist - if you get an ioerror or oserror rather than
unicodeencodeerror then you're probably ok.
If you were going to keep this code the cleanup should be in a finally
block. close should _always_ be in a finally block, that's why it's
called close. :-) This leaks directories on non-unicode systems!
And it will always fail the second time through because 'tree' will
exist.
I think this indicates that you should have tests for the feature. :-)
Maybe at least make sure that the probe method passes, and (maybe?)
fails if you monkeypatch getfsencoding to ascii.
--
Martin
More information about the bazaar
mailing list