[MERGE] Inventory delta support for CommitBuilder

Wed Sep 5 09:00:15 BST 2007

Robert Collins wrote:

> I have a couple of thoughts.

Thanks for this. Much appreciated.

> For knit repo's doing an inventory delta is probably slower than working
> directly on the whole inventory from the dirstate, in dirstate form.
> 
> For repo's that can efficiently apply a delta, an inventory delta is a
> good idea. Packs should be able to do that.
> 
> So I wonder if we should make this depend on the repo - which already
> gives us a commit builder - rather than just blindly doing it delta
> based. 

As you touch on below, the crux of the issue is what the commit code
needs to do. My changes were designed to extend CommitBuilders so that
both 'here is each item' and 'here are the differences' algorithms were
possible, rather than just the former as is the case right now.

> The reason I bring this up is the multiple-handling it will cause today
> (in a hypothetical minimal sense: we're not at this point yet so there
> is more overhead):
> - dirstate will consider every path in the current and basis inventory
> and filter to give list of changes
> - applying the inventory delta will get the basis inventory from
> repository (duplicate work #1) and then apply the logical delta giving
> us a new inventory (which in the common case is the same as the current
> tree so dup work #2), and finally serialise the new inventory to xml,
> then *again* reconstruct the xml from the repository (dup work #3) and
> finally save the delta.
> 
> So to optimise this we need to avoid all 3 items of duplicate work.
> Without an inventory delta based approach its much better for knits
> repositories (again hypothetically with all other things fixed):
>  - dirstate can let us iterate all items into a new 'for commit'
> inventory
>  - we serialise the new inventory to xml and reconstruct the basis xml
> from the repository (dup work #1) to diff them
>  - we save the diff
> 
> so lets make sure we're not painting ourselves into a corner, and make
> sure that the code for committing will be able to fix these bits of
> duplicate work as we move forward without making knits really slow.

Agreed.

> Martin is working on getting split out inventories working, which will
> change the storage mechanisms and may also allow some interesting
> optimisations to the generation of the new inventory during commit. It
> will probably have an in-memory representation that maps very closely to
> its physical storage.

Cool.

> For your patch, based on the above thoughts I have some suggestions:
> 
>  - the new_inv_from_scratch parameter should not be added - its unneeded
> complication. Let each repository type choose how it best works. Instead
> do something like setting a flag on the CommitBuilder, or using a
> subclass or strategy object to separate out the difference between the
> from-scratch or incremental logic. It will get more involved over time
> and I don't think we need those if blocks.
> 
>  - in commit.py choose the code to run based on the commit builder that
> was returned.

Interesting. It's probably possible to hide the new-inv-from-scratch
parameter to the constructor and get that via subclassing.

My reason for not doing that was that, semantically, the code calling
into the CommitBuilder must choose to either:

* call record_entry_contents() for each new item
* call update_entry_contents() for just changed items.

The iterators is each case are very different. As well, in the latter
case, the list of deletions need to get communicated somehow. (In the
proposal, they're in the changes list passed to finish_inventory.)

As well as providing backwards compatibility, I was thinking that the
commit code might decide which algorithm to use, e.g. 'build from
scratch' for initial commits, 'build from delta' for others. That would
increase code complexity but only if we wanted to because we knew it was
a win.

IIUIC, you're suggesting that the CommitBuilder could decide the
algorithm. It could have a method - use_build_from_scratch_approach()
say - that the commit code would query to find the way that the
CommitBuilder either required or recommended to do the deed. If so,
maybe two classes - CommitBuilder and DeltaCommitBuilder say - would
be better.

Am I interpreting your suggestion correctly? Let's chat about this on
#IRC or the phone Thursday morning say.

Ian C.