commit performance regression in 0.11

Mon Sep 25 20:56:05 BST 2006

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> So I have a 2 comments:
>>>
>>> 1) We almost have the infrastructure in place to remove a read_inventory
>>> call during commit. We just need to have the commit builder return the
>>> RevisionTree.
> 
> I don't see anything wrong with this, but it seems simpler to just call
> set_parent_ids.  Yes, I realize that it will be a performance advantage
> for dirstate, once it arrives, but right now it's a disadvantage.

Well, calling set_parent_trees() could use the RevisionTree.inventory to
directly write out, rather than reading from the repository. But this
would only be an advantage if inventory kept the string representation.

I agree with your point that we have a performance loss until dirstate
is merged, though.

> 
>>> 2) Is there any way we can avoid the extra deserialize + serialize step
>>> for caching the basis inventory? Or do we just need to live with it
>>> until we make the new repository format the default?
> 
> The obvious solution would be for the repository serialization format to
> determine the basis serialization format.  IIRC, Robert didn't like that
> approach, because it exposed an implementation detail.
> 
> Another option would be for the basis serialization to work with both
> formats, depending on what format is provided by the repository.
> 
> Finally, it would be possible to textually transform a format 5
> inventory into a format 6 one.
> 
> Aaron

Well all of these start exposing "implementation details", right?

It just depends what api abstraction we want to violate for performance.

Having CommitBuilder return the freshly built RevisionTree means that we
don't have the extra 'read_inventory' step, and once we transition to
the new repository format, we won't have that overhead either.

Also, what do people think about trying to get my no-hash-prefix version
of the repository into 0.12? I haven't been pushing for it, because we
haven't tested it a whole lot. But it really is a lot faster for a new
kernel-sized add + commit (like 1/2 the time). Since your Knit2 format
is experimental, I think we could just put it in the same format, and
then we don't have the overhead of 2 format changes.

Another possibility would be to make the hash-prefix configurable with
another control file. So people could decide which one they wanted to
use, in case a given filesystem performed poorly.

IIRC, the no-prefix version was even faster for the freebsd tree.
Although doing a rm -rf of it was much slower (10x+), apparently while
creating and accessing 100K files in a directory is fast on reiserfs,
deleting them is very slow.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060925/9c5c5d1c/attachment.pgp