[RFH/MERGE 3/3] Towards XML log output: An implementation of XmlLogFormatter
Martin Pool
mbp at canonical.com
Thu Nov 9 08:32:52 GMT 2006
On 8 Nov 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> _encode_and_escape() always does entity escaping because that is what
> ElementTree used to do. It may be that it now prefers to output utf-8.
> Actually, IIRC, entity escaping was also faster than using 'encode()' to
> write it out as utf-8. I guess formatting an integer is faster than
> performing whatever effort it takes to generate utf-8.
>
> I don't know if it is simpler for James to use ElementTree to manage the
> XML tree, and the slow write performance isn't as much of an issue for
> him. (It may not offset the effort for writing a custom serializer).
I suspect it will be just as easy.
> The other possibility is to do:
>
> sio = StringIO()
> tree.write(sio)
> self.outf.write(sio.getvalue().decode('utf8'))
>
> Though it is a lot of double handling.
Right, this would be needed because the object used as outf, somewhat
perversely, if given a str that is already utf-8, tries to decode it in
the default encoding (ascii) and then reencode it.
> The other problems are that other log formatters expect to have outf do
> the encoding for them.
>
> We could let XmlLogFormatter just write to stdout directly. Though that
> violates the 'use to_file that I gave you'.
>
> We could make to_file always be a strict file, and then the log
> formatters have to handle encoding.
>
> We could have log formatters be given 2 file handles. One for unicode
> data, and one for raw bytes.
If you consider that someone might for some reason send xml to their
screen then I think it's clear that it must be in an appropriate
encoding. So the xml log formatter must either look at that and
respond, or conservatively just always use entity escaping.
I suggest doing the second for now as it seems simpler.
--
Martin
More information about the bazaar
mailing list