[MERGE][RFC] further add performance improvements

Tue May 23 21:45:17 BST 2006

Martin Pool wrote:
> On 22 May 2006, Robert Collins <robertc at robertcollins.net> wrote:
> 
>>> I would be curious to see if switching to RIO would gain us more of a
>>> speed improvement than using XML. Though I'm not confident, since we are
>>> using a compiled XML parser, versus a potential python parser (for RIO).
>> Martin indicated to me that RIO was up to twice as fast as celementree.
> 
> That was the case when I benchmarked writing inventories a while ago.
> I had wondered if building the tree and then writing it through
> cElementTree was slowing things down and it's interesting to hear that
> it's not.
> 
> I might update that and put it into subclasses of the new benchmark
> suite.
> 

Well, I don't know what rio format you were choosing to use, but in my
test, I found that rio is slower than cElementTree at reading, but can
be faster at writing.

I have a plugin available from here:
http://bzr.arbash-meinel.com/plugins/rio_inventory

It provides the command 'bzr rio-test', which just grabs the inventory
weave/knit and extracts each inventory. Then times the difference
between reading and writing it to the current XML format, versus the
time to convert it to my RIO format.

Probably my format is sub-optimal, but in doing --lsprof testing, I did
find some very interesting results:

In profiling rio code, I found that 'valid_tag()' has a significant
effect on performance, especially since it gets called twice during
read_stanza (once on reading, a second time on Stanza.add()). It has
even more of an effect when you have nested Stanzas. With nested stanzas
there is also an issue about decoding the same line multiple times. The
attached patch fixes some of the rio performance issues.

My first format had a Stanza for the inventory, with a Stanza for
children, where each entry was another stanza. This performed very
poorly. Primarily because read_stanza is not heavily optimized, and you
are passing over the same text 3 times.

My next format went for a 2-layer approach. Where each entry was a
Stanza inside the main Stanza. This had a 5x performance improvement
(revno 9 is the last 3-layer, revno 10 introduces the 2-layer).
Then I went for a 1-layer approach, and after some optimization of the
parser, I got another 2x improvement.

Which means that my current 1-layer rio format is still 4x slower at
reading an inventory (than cElementTree), but is 2x faster at writing an
inventory.
Also interesting is that it means that reading rio is just slightly
faster than writing XML.

Another very interesting profiling result. Creating a list and calling
writelines() is faster than write() calls as you go along. (At least for
StringIO). So creating a Stanza() and then writing it out is faster than
writing directly. Though creating a Writer object which builds up the
processed lines rather than a Stanza is faster yet. (My best time so far
is 45% cElementTree).

list.append() is faster than 'string1 + string2'. Attached is the
profile for the changes.
Just considering how fast cElementTree is at reading the XML, I think
read_stanza() would be a good place to look into writing a C function.

I'm also going to be looking into the effect on Knits of the new format.

John
=:->

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rio-optimize.diff
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20060523/0171e1f4/attachment.diff 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 1-layer.txt
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20060523/0171e1f4/attachment.txt 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060523/0171e1f4/attachment.pgp