[MERGE] setlocale (#128496)
Martin von Gagern
Martin.vGagern at gmx.net
Tue Jun 17 16:57:13 BST 2008
(I'll answer you mail slightly out of order.)
Martin Pool wrote:
> I guess generally for internal interfaces we try to avoid a 'do what i
> mean' or defensive style. Things should be called with a value of one
> particular type and just fail if they don't get it. Otherwise it can
> hide errors, make the api harder to evolve, or hurt performance. So
> generally rather than the try/except blocks in this code, I'd prefer
> to say that these things should for example just always print an
> escaped form, or always not.
While I wholeheartedly agree with you in the case of production API,
there are opposing goals in the case of unit tests. There you want to
isolate the tests from each other as well as from the testing framework,
you want to get as much useful information about the problem as
possible, and you want to concentrate of the main program and not on the
tests themselves.
The original code broke the isolation and discarded useful information
such as which test actually caused things to break. Requiring all
reports to be 7-bit ASCII would probably be possible for most scenarios,
but would usually result in information about a broken test, not
information about a problem the test detected. In addition to this, I
simply haven't understood the testing framework well enough to catch all
such occurrences, where a non-ASCII string gets passed as a report.
An alternative would be some kind of encoding for printing of test
output that preserves ASCII characters but turns characters with higher
codes into (unicode or byte) escape sequences. Such an encoding,
however, would destroy information. E.g. you couldn't know whether an
escape sequence was part of the string or introduced by the encoding. I
know that my approach destroys information as well, as you can't know
whether _safe_string was called or not. A warning would help.
> My original mail was a bit truncated: it's true that we shouldn't be
> catching all these errors, but a more important point is that we
> generally don't want to catch errors "just in case". The exceptions
> are well defined layers where that's clearly part of their job eg to
> print the error to the user or to cancel an operation. We've
> generally found if you do this it tends to hide bugs from testing and
> also from the developer.
>
> So what I really meant to ask was, why was that code like it is.
Despite all philosophical arguments for and against my code, there are
two simple reasons for this:
1. Having the test suite fail on me for the third time, after running
for about an hour, and not even giving me an indication as to what test
broke it, was seriously annoying. I wanted to prevent this from
happening again, therefore I wanted to make sure tests don't break the
framework.
2. I'm rather new to python, so I didn't know about __repr__ or any
such. I only knew that my runs indicated not all objects encountered in
these places were actual strings (be it byte or unicode), and that I
could think of several ways of turning arbitrary objects into strings,
some preserving more information but also more likely to fail, the
others preserving less information or none at all, but more reliable not
to fail. So I chained them, hoping for as much information as possible,
but falling back to less instead of risking errors.
If you can come up with a better idea for all this, I'll gladly accept
it, but leaving things as they were is not that better idea, I'd say.
> If the str or repr is failing we probably don't want to silently
> absorb that problem. At the very least we should give a warning.
The fact that the fallback string contains less information would be an
indication of this, but a more prominent warning would be all right by
me. I simply hadn't worked out how to achieve this the right way.
> The other kind of conceptual issue is that this is unclear about
> whether it's trying to detect problems in unicode conversion or in
> running __str__.
I had considered the *_escape encodings to be safe for their respective
applications, and therefore rather worried about objects that refuse to
be converted to strings.
> I can see in some cases you would want both but
> other places will know they have a string already; or know that they
> have an object that must be printable. For example it's not clearly
> good that passing an object to assertEqualDiff should magically
> compare it's string value.
At the time _safe_string is called, the control flow already has decided
that the objects are different. So the behaviour of assertEqualDiff with
respect to tests failing or passing is left completely unchanged. The
only thing affected is how the error is presented to the user.
> I'd like to see that, if you can remember or reproduce it. In general
> they should not. It may be that if an error occurs printing some
> information about a test, like the traceback or test name, then the
> suite terminates.
For the original problem see attached log, which can be reproduced in my
setlocale branch like this:
bzr revert -r 3489
LANG=ja_JP.EUC_JP ./bzr selftest
I'm still trying to remember what my tests looked like when I got
something different than a string. Obviously hadn't committed that
scenario, and running the whole selftest takes quite a while. No promises.
Martin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 3489.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080617/51d6a261/attachment.txt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080617/51d6a261/attachment.pgp
More information about the bazaar
mailing list