[RFH/MERGE 3/3] Towards XML log output: An implementation of XmlLogFormatter

Sun Nov 5 15:47:03 GMT 2006

James Westby wrote:
> Hi,
> 
> Attached is an implementation of XmlLogFormatter. It should be easy to
> change the format if wanted. The current output can be seen in the
> testcase.
> 
> It passes the tests that I have written but it is not complete yet, and
> that is why I ask for help. Running it on bzr.dev it fails with
>   
>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>   12: ordinal not in range(128)
> 
> Apparently trying to output one of dato's commits. I thought something
> like this might happen, but I was expecting it to be Erik that triggered
> it.
> 
> In the test case I do use some non-ascii characters, but I must have
> picked badly, as the test case passes. 
> 
> I have spent a bit of time trying to unicode() some bits, and change the
> encoding that I pass to elementtree's output method, but none of these
> seem to make any difference. 
> 
> If someone has any clues that might help me fix that I would be
> grateful.
> 
> Thanks,
> 
> James
> 
> 
> 

v- This is one of those cases where I would do:

from bzrlib import (
    xml_serializer,
    )

And then use xml_serializer.ElementTree rather than duplicating the
complex import logic here.

Alternatively you could do:

from bzrib.xml_serializer import ElementTree

But I'm trying to get away from importing classes and functions directly
when possible. It also means I can fix your code to use lazy importing,
to avoid importing ElementTree if we don't need it.

>  
> +try:
> +    try:
> +        # it's in this package in python2.5
> +        from xml.etree.cElementTree import (ElementTree, SubElement, Element)
> +        import xml.etree as elementtree
> +    except ImportError:
> +        from cElementTree import (ElementTree, SubElement, Element)
> +        import elementtree
> +    ParseError = SyntaxError
> +except ImportError:
> +    mutter('WARNING: using slower ElementTree; consider installing cElementTree'
> +           " and make sure it's on your PYTHONPATH")
> +    # this copy is shipped with bzr
> +    from util.elementtree.ElementTree import (ElementTree, SubElement,
> +                                              Element)
> +    import util.elementtree as elementtree
>  
>  def find_touching_revisions(branch, file_id):
>      """Yield a description of revisions which affect the file_id.
> @@ -238,7 +255,9 @@
>          self.search = search
>          branch.lock_read()
>          try:
> +            self._start_log()
>              self._show_log()
> +            self._end_log()
>          finally:
>              branch.unlock()
>  
> @@ -359,6 +378,13 @@
>              revno += 1
>          return rh
>  
> +    def _start_log(self):
> +        pass
> +
> +    def _end_log(self):
> +        pass
> +
> +
>  class LongLogFormatter(LogFormatter):
>      def show(self, revno, rev, delta):
>          return self._show_helper(revno=revno, rev=rev, delta=delta)
> @@ -474,6 +500,54 @@
>          out.append(rev.get_summary())
>          return self.truncate(" ".join(out).rstrip('\n'), max_chars)

-- PEP8 asks for 2 spaces between top level items, so you need another
blank line here.

>  
> +class XmlLogFormatter(LogFormatter):
> +    """A LogFormatter that outputs in xml format"""

-- and one here

v- Also, I would make private members private by prefixing them with
underscore. That will really help us when we are figuring out what can
be modified without deprecation and what can't.

Also, I would probably call it "self.current_base", and maybe make it a
stack (list) rather than a single node. Because you can have multiple
merge depths.

A - B - C - D - E
 \            /
  - F - G - H
     \     /
      - I -

I believe thas shows up in log as something like:

A
B
C
D
  F
  G
    I
  H
E

> +    def _start_log(self):
> +        self.doc = Element('log')
> +        self.doc.tail = '\n'
> +        self.doc.text = '\n'
> +        self.last_merge_depth = 0
> +        self.current_root = self.doc
> +
> +    def _end_log(self):
> +        ElementTree(self.doc).write(self.to_file, 'utf-8')
> +
> +    def show(self, revno, rev, delta):
> +        entry = self._rev_to_xml(revno, rev)
> +        self.doc.append(entry)
> +        self.current_root = entry
> +        self.last_merge_depth = 0
> +
> +    def show_merge_revno(self, rev, merge_depth, revno):
> +        if merge_depth > self.last_merge_depth:
> +            merge_node = SubElement(self.current_root, 'merged')
> +            merge_node.text = '\n'
> +            merge_node.tail = '\n'
> +            self.current_root = merge_node
> +        entry = self._rev_to_xml(revno, rev)
> +        self.current_root.append(entry)
> +        self.last_merge_depth = merge_depth
> +        
> +    def _rev_to_xml(self, revno, rev):
> +        date_str = format_date(rev.timestamp,
> +                               rev.timezone or 0,
> +                               self.show_timezone)
> +        logentry = Element('logentry')
> +        logentry.attrib['revid'] = rev.revision_id
> +        logentry.attrib['revno'] = revno
> +        logentry.text = '\n'
> +        logentry.tail = '\n'
> +        author = SubElement(logentry, 'author')
> +        author.text = rev.committer
> +        author.tail = '\n'
> +        date = SubElement(logentry, 'date')
> +        date.text = date_str
> +        date.tail = '\n'
> +        msg = SubElement(logentry, 'msg')
> +        msg.text = rev.message;
> +        msg.tail = '\n'
> +        return logentry
> +        

...

>                          LogFormatter, 
>                          LongLogFormatter, 
>                          ShortLogFormatter, 
> -                        LineLogFormatter)
> +                        LineLogFormatter,
> +                        XmlLogFormatter)

^- For long lists like this it is better to do:

  X,
  Y,
  )

Because that way adding
  Z,
Only modifies 1 line, rather than modifying 2.

...

v- If you are going to open a file like this, you need a 'try/finally'
to make sure it is closed, otherwise windows will start puking when we
go to clean up the test directories.

But even better is to just use a cStringIO.StringIO() and then call
getvalue()

logfile = cStringIO.StringIO()
formatter = XmlLogFormatter(to_file=logfile)
formatter.show_log(wt.branch)
log_contents = logfile.getvalue()

Further, something seems weird about you getting accented characters in
the log file itself. What is the encoding utf-8? I would probably prefer
you wrote the specific escape values, because we try not to have
non-ascii characters in our source.

Further, at least some versions of elementtree actually use the XML
escape codes for non-ascii characters. So you get &#1234; rather than
\u1234.

> +        logfile = file('out.tmp', 'w+')
> +        formatter = XmlLogFormatter(to_file=logfile)
> +        formatter.show_log(wt.branch)
> +        logfile.flush()
> +        logfile.seek(0)
> +        log_contents = logfile.read()
> +        self.assertEqualDiff(log_contents, '''\
> +<log>
> +<logentry revid="2b" revno="2">
> +<author>Xml-Log-Formatter Tester &lt;test at xml.log&gt;</author>
> +<date>Mon 2005-11-21 09:32:56 -0600</date>
> +<msg>message with &lt;"&gt; characters tö éscapè and non-ascii</msg>
> +<merged>
> +<logentry revid="2a" revno="1.1.1">
> +<author>Xml-Log-Formatter Tester &lt;test at xml.log&gt;</author>
> +<date>Mon 2005-11-21 09:27:22 -0600</date>
> +<msg>multiline
> +commit
> +message</msg>
> +</logentry>
> +<logentry revid="3a" revno="1.2.1">
> +<author>Xml-Log-Formatter Tester &lt;test at xml.log&gt;</author>
> +<date>Mon 2005-11-21 09:24:15 -0600</date>
> +<msg>tree3 commit</msg>
> +</logentry>
> +</merged>
> +</logentry>
> +<logentry revid="1" revno="1">
> +<author>Xml-Log-Formatter Tester &lt;test at xml.log&gt;</author>
> +<date>Mon 2005-11-21 09:24:15 -0600</date>
> +<msg>simple log message</msg>
> +</logentry>
> +</log>
> +''')
> +

...

so a few comments, but it generally looks pretty good.

I wonder when elementtree changed their formatting?

Also, for your specific problem, it might be Unicode characters in the
author or message. You seem to have covered the message, but not the author.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061105/05c49c20/attachment.pgp