[RFC] Does ~/.bzr.log have to be exactly utf-8 encoded?

John Arbash Meinel john at arbash-meinel.com
Mon Aug 21 19:53:16 BST 2006


We have a number of bugs that have arisen over time, because we may pass
unfiltered arguments to mutter(). Because we declared '~/.bzr.log' as
utf-8 encoded, we use a codecs.open(..., 'utf8') file.

What this does, however, is require that all strings passed to
file.write() be valid utf8 strings. Even more importantly, because
internally the file is actively encoding every string that is passed in,
if you pass a utf-8 string, it tries to up-cast it to Unicode, so that
it can down encode it into utf-8.

By default, most people's default encoding is 'ascii' not 'utf8'. You
have to manually customize site.py to change this.

There are a couple of ways to approach the problem:

1) Do what I've been doing. Manually upcast to Unicode, and if anything
fails, use 'repr()' to turn it into an ascii string. This works, but it
means that when you pass certain strings, they look a whole lot uglier
than they have to. Especially certain tracebacks, etc, end up showing up
as a repr() string, rather than something that might look nicer.

2) Change 'bzrlib.trace._trace_file' to be a standard file, and manually
downcast Unicode to utf-8 before writing it out.

(1) is certainly possible, and I've already done the fix to do it, as
part of fixing some other bugs.

(2) has the property that the output file isn't guaranteed to be in
utf-8. *most* of it will be, but if you did:
mutter('foo: %s', '\xff\xff\xff\xff')

Then literal '\xff' characters would be output into the log file.

Anyway, the question basically boils down to: is it better to get more
repr() strings in ~/.bzr.log to ensure that it is truly utf-8 encoded.
Or is it better to be as close to utf-8 as we can, but avoid repr()
strings, since they are harder to understand.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060821/be2742b9/attachment.pgp 


More information about the bazaar mailing list