[MERGE] Reconcile can fix bad parent references

Tue Aug 28 06:29:11 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
>>> around your manual data insertion it will work fine.
>> I can do that, but I thought we wouldn't need this for packs.
> 
> In 'user space' we don't have to.

> However, folk getting right down under the hood and manually inserting
> fragments of data do need to do this.

I meant, I thought we wouldn't need parent reconciliation for packs.
But given the following discussion, I'll work on generalizing it.

>> It is so that the revision_versions dict is pre-populated.
> 
> Cool - please note that so noone reverts this in future.

Done.

>> If I didn't know the knit data was untrustworthy, I could use that.  But
>> I don't see how I can use corrupt parents data to repair itself.  I
>> can't even know if I'm iterating in topological order...
> 
> Right. I've done a proof that I think is complete, and John seems to
> agree. So we're good to go with this.

Great.

> (Of course, if
> inventoryentry.revision is borked - if we have not added the right nodes
> at the right time to tie the two graphs together properly, then the
> proof becomes invalid; so in a strict sense we should check that too.)
> But I don't think that that is necessary for this patch - unbreaking
> pull is our immediate concern.

Full ACK.

> mmm. I have a fetch that won't tolerate this brokeness, but some caveats
> apply:
>  - fetching from a knit repo (e.g. upgrade) still tolerates it
>  - errors of this sort are not detected during fetch.
> 
> I'm not entirely sure where to fix this best.

Well, it's certainly an option to modify fetch to retrieve the
unreferenced revisions.  Or else to cause intolerent fetch to detect
when it is missing parents.

> 
>>> Is :
>>>
>>>> +class _RevisionParentsProvider(object):
>>> Really needed? If we check the revision is cached correctly at the start
>>> of check
>> We would also need to do the same in reconcile, and make sure that they
>> never get out of sync.  I thought it was just simpler to use the data
>> that we actually trust, instead of a proxy.
> 
> Thats fair enough. I'm concerned about performance/memory I guess.

I've measured performance as > 5% of total runtime for check.  I would
expect that this uses the same or less memory than the revisions knit index.

> If we
> are accessing the real revisions earlier in the process it seems to me
> that we should have the parent details cached - and the revision index
> is such a cache.

Well, it's a cache of the index file, which itself is partially a cache
of the Revision XML.

> Reconcile does not check for parents because we don't know of any way
> they can be wrong in a knit format repository today.
> 
>>> I think its better to check for an attribute like I did in my recent
>>> reconcile patch - _reconcile_does_inventory_gc is what I used. This is
>>> better than a subclass check because it allows foreign formats to be
>>> tested without requiring them to spuriously subclass Knits.
>> As we discussed it, only knits would support this, so I check for knits.
> 
> Well, I'm not saying 'packs will support it', I'm saying that foreign
> formats may have the same error in them, and its cleaner to check for
> the specific aspect that drives the problem rather than the class.

Okay, I'll switch to the attribute approach.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG07Kn0F+nu1YWqI0RArD9AJ9ffJpxWW9JdvemzJwQvaEXfIxvkACfQ7/b
bu7v46NMWSNq67xBdyW6rhk=
=DT7q
-----END PGP SIGNATURE-----