[MERGE] UTF-8 encoding in binary diffs

Thu Jul 12 08:05:40 BST 2007

On 7/12/07, Robert Collins <robertc at robertcollins.net> wrote:
> On Thu, 2007-07-12 at 16:50 +1000, Martin Pool wrote:
> >
> > In other cases this might have value though - for example when we want
> > to allow for platforms with normalization.  But even then it's
> > probably better handled by having tests that don't need to exercise
> > normalization use names that won't be affected.
>
> I think what I'm getting at is that we can cheaply increase our test
> coverage of non-ascii names by making all tests use
> normalisation-requiring names whenever possible.

OK, so I think we have two different types of test:

1- tests like jml's test which are inherently testing unicode
behaviour, and effectively cannot be tested without support for that
feature -- i think here it'd be better to just mark it as skipped

2- tests that can exercise unicode features in passing, if the
environment supports them, but that can usefully be run against only
ascii names if necessary, or with non-normalizable names if necessary

I think what you're saying is that we might catch more unicode bugs if
we try everything with unicode names in case 2.  And it probably
would.  In particular if the original test for diff output had used
such names, then jml wouldn't need to add a new test at all.

On the other hand, this makes the test code a bit more complex - eg
they can't just compare the expected output to a fixed string, but
rather need to compose it using the dynamically chosen filenames.

I think they're not exclusive - we can let tests depend on a unicode
filename support feature, and also add a method that returns some
appropriate filenames.

-- 
Martin