[MERGE] setlocale (#128496)

Tue Jun 17 16:57:13 BST 2008

(I'll answer you mail slightly out of order.)

Martin Pool wrote:
> I guess generally for internal interfaces we try to avoid a 'do what i
> mean' or defensive style.  Things should be called with a value of one
> particular type and just fail if they don't get it.  Otherwise it can
> hide errors, make the api harder to evolve, or hurt performance.  So
> generally rather than the try/except blocks in this code, I'd prefer
> to say that these things should for example just always print an
> escaped form, or always not.

While I wholeheartedly agree with you in the case of production API, 
there are opposing goals in the case of unit tests. There you want to 
isolate the tests from each other as well as from the testing framework, 
you want to get as much useful information about the problem as 
possible, and you want to concentrate of the main program and not on the 
tests themselves.

The original code broke the isolation and discarded useful information 
such as which test actually caused things to break. Requiring all 
reports to be 7-bit ASCII would probably be possible for most scenarios, 
but would usually result in information about a broken test, not 
information about a problem the test detected. In addition to this, I 
simply haven't understood the testing framework well enough to catch all 
such occurrences, where a non-ASCII string gets passed as a report.

An alternative would be some kind of encoding for printing of test 
output that preserves ASCII characters but turns characters with higher 
codes into (unicode or byte) escape sequences. Such an encoding, 
however, would destroy information. E.g. you couldn't know whether an 
escape sequence was part of the string or introduced by the encoding. I 
know that my approach destroys information as well, as you can't know 
whether _safe_string was called or not. A warning would help.

> My original mail was a bit truncated: it's true that we shouldn't be
> catching all these errors, but a more important point is that we
> generally don't want to catch errors "just in case".  The exceptions
> are well defined layers where that's clearly part of their job eg to
> print the error to the user or to cancel an operation.  We've
> generally found if you do this it tends to hide bugs from testing and
> also from the developer.
> 
> So what I really meant to ask was, why was that code like it is.

Despite all philosophical arguments for and against my code, there are 
two simple reasons for this:

1. Having the test suite fail on me for the third time, after running 
for about an hour, and not even giving me an indication as to what test 
broke it, was seriously annoying. I wanted to prevent this from 
happening again, therefore I wanted to make sure tests don't break the 
framework.

2. I'm rather new to python, so I didn't know about __repr__ or any 
such. I only knew that my runs indicated not all objects encountered in 
these places were actual strings (be it byte or unicode), and that I 
could think of several ways of turning arbitrary objects into strings, 
some preserving more information but also more likely to fail, the 
others preserving less information or none at all, but more reliable not 
to fail. So I chained them, hoping for as much information as possible, 
but falling back to less instead of risking errors.

If you can come up with a better idea for all this, I'll gladly accept 
it, but leaving things as they were is not that better idea, I'd say.

> If the str or repr is failing we probably don't want to silently
> absorb that problem.  At the very least we should give a warning.

The fact that the fallback string contains less information would be an 
indication of this, but a more prominent warning would be all right by 
me. I simply hadn't worked out how to achieve this the right way.

> The other kind of conceptual issue is that this is unclear about
> whether it's trying to detect problems in unicode conversion or in
> running __str__.

I had considered the *_escape encodings to be safe for their respective 
applications, and therefore rather worried about objects that refuse to 
be converted to strings.

> I can see in some cases you would want both but
> other places will know they have a string already; or know that they
> have an object that must be printable.  For example it's not clearly
> good that passing an object to assertEqualDiff should magically
> compare it's string value.

At the time _safe_string is called, the control flow already has decided 
that the objects are different. So the behaviour of assertEqualDiff with 
respect to tests failing or passing is left completely unchanged. The 
only thing affected is how the error is presented to the user.

> I'd like to see that, if you can remember or reproduce it.  In general
> they should not.  It may be that if an error occurs printing some
> information about a test, like the traceback or test name, then the
> suite terminates.

For the original problem see attached log, which can be reproduced in my 
setlocale branch like this:

bzr revert -r 3489
LANG=ja_JP.EUC_JP ./bzr selftest

I'm still trying to remember what my tests looked like when I got 
something different than a string. Obviously hadn't committed that 
scenario, and running the whole selftest takes quite a while. No promises.

Martin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 3489.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080617/51d6a261/attachment.txt 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080617/51d6a261/attachment.pgp