[MERGE][0.11] commit performance regression in 0.11

Robert Collins robertc at robertcollins.net
Tue Sep 26 02:44:21 BST 2006


On Mon, 2006-09-25 at 19:40 -0500, John Arbash Meinel wrote:
> 
> I know you gave me a preemptive +1 for this but it turned out to be a
> little bit less perfect than I wanted. So I'm posting it for review.
> 
> 1) It turns out that a lot of code actually expected to go through
> repository.deserialize_inventory() because that would add a
> 'inventory.root.revision_id' value for format 5 inventories (Knit1
> repositories). But since I was now directly writing the committed
> inventory to disk, I needed to add it myself. So I ended up with:
> 
> inv = leftmost_parent_tree.inventory
> if not self.branch.repository._format.rich_root_data:
>     inv.root.revision = leftmost_parent_id
> 
> Now, I don't know why the commit builder isn't creating the root
> revision (probably also has to do with Knit1 versus Knit2
> repositories).

This does seem ugly. I think its because we are exposing dual-model
inventories outside of the repository layer. I drafted some notes about
this sort of transition - basically I believe we should have only one
model in active use throughout the code base, and that every
producer/consumer of objects should upgrade/downgrade as needed. Our
fast-path InterObject instances are the sole exception, because they can
provide specific 'downlevel to downlevel' functionality.

Lets try to clean this specific case up during 0.12.

> 2) I'm not 100% happy with CommitBuilder.get_tree() as the function
> name. I had 'get_committed_tree()' for a while, or I could do
> 'revision_tree()' to mirror the Repository name...

I'd prefer revision_tree() to mirror the Repository name.
Also, please put the fix for 1) within the CommitBuilder. We can have an
assert in set_parent_trees if you like.

> 3) I did a lot of refactoring of private functions in WorkingTree,
> because of common functionality that wasn't 100% common.

I was thinking of a different tack than you took:  I was thinking that
it would look like:

set_parent_trees(self, trees, allow_ghost):
    self.set_parent_tree_ids([revid for rev_id, tree in trees])
    try:
        left_most_tree = trees[0][1]
    except IndexError:
        left_most_tree = None
    if left_most_tree is None:
        self._delete_basis_inventory()
    else:
        self._set_basis_inventory(left_most_tree.inventory)

and that set_parent_tree_ids would not need to change at all.

If you think this may make the patch cleaner, please twiddle it,
otherwise I'm ok with going with what you've got.

> I did check, and this removes the extra read_inventory() calls.
> Unfortunately read_inventory() is still faster than write_inventory().
> But with this patch, we are down to 1 inventory read, and 2 inventory
> writes when committing to a new tree. (And because the formats are
> different, we are stuck with 2 writes).

Its also appropriate to do 2 writes IMO.

> I'd like to do a little bit more tweaking of the serializer, because I
> think there are ways to improve it even more. But that can certainly
> be
> an 0.12 thing. 

Yup.

+1 from me for this - merge to bzr.dev, I'll do the cherrypick to 0.11.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060926/2805cb12/attachment.pgp 


More information about the bazaar mailing list