[MERGE] UTF-8 encoding in binary diffs

Jonathan Lange jml at mumak.net
Thu Jul 26 08:59:00 BST 2007


On 7/19/07, Martin Pool <mbp at sourcefrog.net> wrote:
> On 7/17/07, Jonathan Lange <jml at mumak.net> wrote:
> > > UPDATE: Not mergeable in its current form because it breaks when the
> > > test case can't create unicode filenames - jml, could you please fix
> > > that as described on the list?
> >
> > Fixed.
>
> Thanks!
>
> > I added the Feature to test_diff, as it seems that most Features are
> > defined in the tests modules in which they are used. To me, it seems
> > better to have a central bzrlib.tests.features module.
>
> That would be good, could you do please?  Either a followon or an
> update is fine.
>

Sure.

> Since this caused some discussion and since the utf-8 encoding is only
> a stopgap, could you please add a comment explaining why it's utf-8,
> why the body is not encoded, and why it should really be a format
> specified by the caller?  Just summarize this thread.
>

I'd be happy to, but I'm not sure where these comments should go:
_show_diff_trees? InventoryFile._diff? the test?

> +class _UnicodeFilename(Feature):
> +    """Does the filesystem support Unicode filenames?"""
> +
> +    def _probe(self):
> +        filename = u'\u03b1'
> +        os.mkdir('tree')
> +        try:
> +            fd = file('tree/' + filename, 'wb')
> +        except UnicodeEncodeError:
> +            return False
> +        fd.close()
> +        os.remove('tree/' + filename)
> +        os.rmdir('tree')
> +        return True
> +
> +UnicodeFilename = _UnicodeFilename()
>
> I'm concerned that this might be called when not in a temp dir and
> would pollute the cwd.  (Code that's not in a temp dir test case
> shouldn't care about this feature but I wouldn't bet on it.)
>
> Is it really necessary to actually write a file?  Why not try to read
> one that doesn't exist - if you get an ioerror or oserror rather than
> unicodeencodeerror then you're probably ok.
>

I've experimented: you're right. Changed to do a read and catch the
expected error.

> If you were going to keep this code the cleanup should be in a finally
> block.  close should _always_ be in a finally block, that's why it's
> called close. :-)

I disagree. fd is only going to be set when no exception is raised.

I'll make this change if you insist on it.

> This leaks directories on non-unicode systems!
> And it will always fail the second time through because 'tree' will
> exist.
>

D'oh. I added that as part of an experiment and neglected to remove
it. The directory stuff is gone.

> I think this indicates that you should have tests for the feature. :-)
>  Maybe at least make sure that the probe method passes, and (maybe?)
> fails if you monkeypatch getfsencoding to ascii.

I've added a test that checks that it passes. Monkeypatching
getfilesystemencoding didn't seem to be effective.

Thanks for the review. Interim diff attached.

jml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: show-diff-trees-110092-3.diff
Type: text/x-diff
Size: 25023 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070726/5bac841c/attachment-0001.bin 


More information about the bazaar mailing list