commit performance regression in 0.11

Mon Sep 25 19:51:07 BST 2006

I've been doing some performance testing of bzr-0.11, and it turns out
we have a small performance regression when committing small changes to
large trees. Overall, we are faster at putting data into knit files (I
assume this is mostly from my knit changes, so that we don't
"pre-create" the files).

However, I can see that we now are deserializing the inventory extra
times. It took a while to track down. The read_locked() decorator is a
big problem here. I frequently track called functions both forwards and
backwards (who did I call, who called me). And while it is easy to
dereference a 'read_locked' in the forward direction by reading the
source code of the current function. Dereferencing it in the backward
direction means grepping through the source code looking at *all* callers.

Anyway, this is what I found:

      read_inventory	read_inventory_from...	write_inventory_to...
0.10: 2			0			1
0.11: 1			2			2

Which means that 0.11 actually has 1 extra read inventory, and 1 extra
write inventory.

In 0.10 we had 2 calls to 'WorkingTree.read_working_inventory' because
commit could modify the working inventory. Robert fixed this, so we now
only have 1 call to read_working_inventory (which removes 1 call to
serializer.read_inventory). However, he now is using
'set_parent_trees()' which has to call 'repository.revision_tree()' on
the newly committed revision, which then has to deserialize the
inventory that it just generated. Which means that we have no net gain
from Robert's changes.

Also, in 0.11, Aaron changed the checkout's basis inventory to use
format=6 inventories (because of proper versioning of the root entry).
Which means that we now have an extra extract + serialize step for every
commit.

So I have a 2 comments:

1) We almost have the infrastructure in place to remove a read_inventory
call during commit. We just need to have the commit builder return the
RevisionTree.

2) Is there any way we can avoid the extra deserialize + serialize step
for caching the basis inventory? Or do we just need to live with it
until we make the new repository format the default?

I'm attaching the graphs, which show that as long as we are committing
enough files, our improved I/O makes up for the losses due to the extra
serialization. (For a full-kernel tree we save ~ 20s, which makes up for
the 3s loss due to the extra serialization overhead).

I can just include the full-tree commit performance in the Performances
page, but I think we need to come up with a plan for fixing the other
issues.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new-commit-0.11.png
Type: image/png
Size: 8967 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060925/b7bdb5fb/attachment.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: partial-commit-0.11.png
Type: image/png
Size: 16526 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060925/b7bdb5fb/attachment-0001.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060925/b7bdb5fb/attachment.pgp