commit performance regression in 0.11

Robert Collins robertc at robertcollins.net
Tue Sep 26 02:51:15 BST 2006


On Mon, 2006-09-25 at 18:04 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> >> Yes, I realize that it will be a performance
> >> advantage
> >> for dirstate, once it arrives, but right now it's a disadvantage.
> > 
> > It can be an advantage right now though, which is why we should do this.
> 
> How can it be an advantage right now?  By caching the basis inventory on
> commit?

Right. Consider John's table:

      read_inventory    read_inventory_from...  write_inventory_to...
0.10: 2                 0                       1
0.11: 1                 2                       2

The two reads are coming in because commit.py is calling
repository.revision_tree() to get the inventory, and set_parent_trees is
calling set_parent_tree_ids blindly, which is then calling
repository.revision_tree().

So fixing commit to get it from the builder removes one read, and fixing
set_parent_trees to use the supplied tree removes another, giving us 1
inventory read and 2 writes (one to the repository, one to the tree).
writes should be cheaper than reads as we dont need to parse (but still
have to do the utf8 dance).

> > Consider a bzr working tree committing to a svn one, or a bzr working
> > tree that is a checkout of an hg one. Yes these are not our primary use
> > cases, but thinking about them is a good way of ensuring we can justify
> > any coupling that we do do.
> 
> Anything that is a repository is expected to implement
> get_inventory_xml.  If it's hg or svn, I bet it will return one of our
> inventory formats.

Sure, but as xml is an interoperability format its really not an
efficient way to work with the data: repositories are also expected to
offer revision_tree(), which requires generating the same data from the
foreign source, but not encoding it as xml.

> > working trees should be orthogonal to the repository format, which means
> > that they should have constraints on them placed on them by bzr's model,
> > not by any one repository.
> 
> I'm not saying it has to be a hard link.  It can be a hint about which
> inventory format will perform best.  Anyhow, that's one of several options.

Well, repository formats can change without warning for a working tree,
and the data access patterns within a inventory seem different (path
based vs revision based), so I'm quite convinced we want separate tuned
formats anyway.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060926/01a5f182/attachment.pgp 


More information about the bazaar mailing list