[RFH/MERGE 3/3] Towards XML log output: An implementation of XmlLogFormatter

Martin Pool mbp at canonical.com
Thu Nov 9 08:32:52 GMT 2006


On  8 Nov 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:

> _encode_and_escape() always does entity escaping because that is what
> ElementTree used to do. It may be that it now prefers to output utf-8.
> Actually, IIRC, entity escaping was also faster than using 'encode()' to
> write it out as utf-8. I guess formatting an integer is faster than
> performing whatever effort it takes to generate utf-8.
> 
> I don't know if it is simpler for James to use ElementTree to manage the
> XML tree, and the slow write performance isn't as much of an issue for
> him. (It may not offset the effort for writing a custom serializer).

I suspect it will be just as easy.

> The other possibility is to do:
> 
> sio = StringIO()
> tree.write(sio)
> self.outf.write(sio.getvalue().decode('utf8'))
> 
> Though it is a lot of double handling.
 
Right, this would be needed because the object used as outf, somewhat
perversely, if given a str that is already utf-8, tries to decode it in
the default encoding (ascii) and then reencode it.

> The other problems are that other log formatters expect to have outf do
> the encoding for them.
> 
> We could let XmlLogFormatter just write to stdout directly. Though that
> violates the 'use to_file that I gave you'.
> 
> We could make to_file always be a strict file, and then the log
> formatters have to handle encoding.
> 
> We could have log formatters be given 2 file handles. One for unicode
> data, and one for raw bytes.

If you consider that someone might for some reason send xml to their
screen then I think it's clear that it must be in an appropriate
encoding.  So the xml log formatter must either look at that and
respond, or conservatively just always use entity escaping.
I suggest doing the second for now as it seems simpler.

-- 
Martin




More information about the bazaar mailing list