[MERGE] Inventory delta support for CommitBuilder

Sun Sep 2 23:34:17 BST 2007

On Thu, 2007-08-30 at 12:20 +1000, Ian Clatworthy wrote:
> 
> 
> The background to this is that we want to change commit so that the
> main
> iterator only tells us about things that have changed or may have
> changed. The implication is that CommitBuilder needs to accept an
> inventory delta (against the current basis inventory) rather than
> build
> the new inventory from scratch each time by looping over every item.
> 
> Note that I published an RFC about this earlier and have tried to
> incorporate Aaron's feedback on the overall approach in the code
> attached. Also note that this code should be merged after the "Quicker
> initial commit" patch in order for the return code from
> update_entry_content (which isn't currently used) to be correct.

I have a couple of thoughts.

For knit repo's doing an inventory delta is probably slower than working
directly on the whole inventory from the dirstate, in dirstate form.

For repo's that can efficiently apply a delta, an inventory delta is a
good idea. Packs should be able to do that.

So I wonder if we should make this depend on the repo - which already
gives us a commit builder - rather than just blindly doing it delta
based. 

The reason I bring this up is the multiple-handling it will cause today
(in a hypothetical minimal sense: we're not at this point yet so there
is more overhead):
- dirstate will consider every path in the current and basis inventory
and filter to give list of changes
- applying the inventory delta will get the basis inventory from
repository (duplicate work #1) and then apply the logical delta giving
us a new inventory (which in the common case is the same as the current
tree so dup work #2), and finally serialise the new inventory to xml,
then *again* reconstruct the xml from the repository (dup work #3) and
finally save the delta.

So to optimise this we need to avoid all 3 items of duplicate work.
Without an inventory delta based approach its much better for knits
repositories (again hypothetically with all other things fixed):
 - dirstate can let us iterate all items into a new 'for commit'
inventory
 - we serialise the new inventory to xml and reconstruct the basis xml
from the repository (dup work #1) to diff them
 - we save the diff

so lets make sure we're not painting ourselves into a corner, and make
sure that the code for committing will be able to fix these bits of
duplicate work as we move forward without making knits really slow.

Martin is working on getting split out inventories working, which will
change the storage mechanisms and may also allow some interesting
optimisations to the generation of the new inventory during commit. It
will probably have an in-memory representation that maps very closely to
its physical storage.

For your patch, based on the above thoughts I have some suggestions:

 - the new_inv_from_scratch parameter should not be added - its unneeded
complication. Let each repository type choose how it best works. Instead
do something like setting a flag on the CommitBuilder, or using a
subclass or strategy object to separate out the difference between the
from-scratch or incremental logic. It will get more involved over time
and I don't think we need those if blocks.

 - in commit.py choose the code to run based on the commit builder that
was returned.

bb:resubmit
-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070903/d3dba64a/attachment-0001.pgp