[RFH/MERGE 3/3] Towards XML log output: An implementation of XmlLogFormatter

John Arbash Meinel john at arbash-meinel.com
Wed Nov 8 15:48:51 GMT 2006


Martin Pool wrote:
...

>>
>> I have spent a bit of time trying to unicode() some bits, and change the
>> encoding that I pass to elementtree's output method, but none of these
>> seem to make any difference. 
>>
>> If someone has any clues that might help me fix that I would be
>> grateful.
> 
> OK, I think the problem is this: eventually you come to write this to
> cmd_log.outf, and that wants to reencode things.  xml needs different
> output encoding to plain text: you should either always do utf-8 or
> (better?) always entity-escape.
> 

_encode_and_escape() always does entity escaping because that is what
ElementTree used to do. It may be that it now prefers to output utf-8.
Actually, IIRC, entity escaping was also faster than using 'encode()' to
write it out as utf-8. I guess formatting an integer is faster than
performing whatever effort it takes to generate utf-8.

I don't know if it is simpler for James to use ElementTree to manage the
XML tree, and the slow write performance isn't as much of an issue for
him. (It may not offset the effort for writing a custom serializer).

The other possibility is to do:

sio = StringIO()
tree.write(sio)
self.outf.write(sio.getvalue().decode('utf8'))

Though it is a lot of double handling.

The other problems are that other log formatters expect to have outf do
the encoding for them.

We could let XmlLogFormatter just write to stdout directly. Though that
violates the 'use to_file that I gave you'.

We could make to_file always be a strict file, and then the log
formatters have to handle encoding.

We could have log formatters be given 2 file handles. One for unicode
data, and one for raw bytes.


It also brings up the point that the other log formatters use stdout
encoding, but XML seems it would only be doing utf-8 encoding. I'm sort
of okay with that, because XML seems like it is meant more for a
front-end to parse, rather than something a user reads.


Ugh... If only there was a Windows console program that didn't suck. And
it could use a Unicode aware encoding (utf-8, utf-16, something) as the
default. Then we could say "if you want to use non-ascii characters with
bzr, you need to install this console". Most Windows users would want a
GUI anyway.

I'm just getting tired of all the special casing we have to do to try
and support win32 and code pages.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061108/24858ac4/attachment.pgp 


More information about the bazaar mailing list