RFC: iter_changes based commit & inventory

Tue Dec 2 14:34:45 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> So, commit is inherently inventory based, because the abstract Tree does
> not provide things like 'last_changed' that inventory contains.

...

> Specifically I need to get the .revision attribute on all file_ids
> returned by iter_changes in order to have the per-file parents
> populated.
> 
> The options I see are:
>  - a method for getting the 'revision' attribute on its own, on Tree.
>  - a method to get an entire InventoryEntry, on Tree
>  - to use the basis inventory from the repository for this data
>  - update iter_changes (with deprecated yada yada) to expose this.
>  - make DirstateRevisionTree.inventory be a dynamic object which
>    supports partial-evaluation. 
> 
> Using the repository basis will require a few small IO's in a
> split-inventory situation to obtain, where the data is already present
> and parsed in a dirstate tree. So I think its less desirable than
> getting the data from dirstate. Getting an entire inventory entry is
> unneeded here - the revision field is the only(*) field missing from
> iter_changes.

We did a lot of work to pack things into dirstate so we could extract
them efficiently line-wise. It would be a shame to not actually use that.

> 
> getting just the revision attribute on it's own is not terribly
> appealing to me.
> 
> Updating iter_changes would be ideal in some respects. WorkingTrees
> could use CURRENT_REVISION as their value for this. But its not as
> clearly defined as it might be - do they need to check for a
> content/meta change before setting it? Or is it only ever meaningful
> when != CURRENT_REVISION?
> 
> Updating iter_changes is probably the most work, due to its performance
> sensitivity and optimised C versions.
> 
> Making DirstateRevisionTree.inventory be a dynamic proxy for
> late-created InventoryEntry objects is relatively easy, but I'd be
> concerned about triggering performance issues as a result of doing that
> - there will be overhead on every access to the inventory.
> 
> I'm inclined to add a specific helper method for just the revision_id,
> to be expedient.
> 
> -Rob

My feelings are:

1) Putting .revision in iter_changes is the ideal way to go. And while
we are at it, we should include the sha1sum. Both as "optional" fields,
so that when we don't know it, we can just say "I don't know", rather
than requiring the code to compute it always. (We don't always need it,
but often we have it 'for free' rather than requiring another lookup.)

2) Making DirstateRevisionTree.inventory (and probably WT4.inventory) a
lazy proxy seems like a lot of work for marginal gain, and as you say,
potential loss for other workflows.

3) Adding a helper seems expedient and reasonable enough. I would tend
to make it return the whole InventoryEntry because if you've read the
entry to get the revision attribute, you already have enough information
 to create the whole entry, and then we don't need specific helper
functions for every-little-thing. But if you want to make it minimal,
that is your choice.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk1R4QACgkQJdeBCYSNAANnsACgzsduli+r+Hv7411FLMLQ4TrE
OsMAoI9yXOX5aib0W1IicPjRzHpD602x
=fpG6
-----END PGP SIGNATURE-----