check/reconcile and _very_ old history

Wed Feb 4 23:48:52 GMT 2009

So we still have some bogus data inserted back when bzr was learning
about 'merge'.

Concretely, if you get repository object and do:
tkr = r.find_text_key_references()
print tkr[('intset.py-20050717175247-81cd658f9aaa2731',
'robertc at robertcollins.net-20050919060519-f582f62146b0b458')]

You should get False back. This means that the rev in that text key did
not introduce the text key; some other inventory uses it however.

Now, in a model where we just look at the refs and fetch them, this is
fine if odd; however we try to fetch based on doing delta analysis.

check and reconcile were altered to handle this somewhat a while back -
last year. But, because (deliberately) they don't alter the inventory
content, they cannot completely fix this, and its still possible to end
up dropping such bad refs when we fetch, because our fetch code is
roughly:
text_keys = [key for key,correct in r.find_text_key_references() if
key[1] in revs_to_fetch and correct]
(It uses a lower and cheaper function, but this is for illustration)

I think there are three basic fixes:
 - stop checking for 'correct'. This will lead to very old unmodified
texts being refetched every 200 revisions. In big trees this will be a
big deal
 - rewrite the inventories to claim a more valid text key - either to
have their own, or to use an appropriate parent if the content had not
changed.
 - use the references the inventories have without filtering, but
perform a diff against the adjacent parent inventories that we are not
fetching

The last option is what brisbane-core is designed to do efficiently, but
we can do it without _to much_ work I think for xml inventories. The
adjacent inventories are usually available locally (for pull), or local
(for push). So we could deserialise them, scan for refs, and cut the
contents out.

I think the last option is appropriate because it doesn't require
rewriting inventories (which breaks digital signatures etc), and is
robust in a conceptual sense.

I need feedback here, because I want to fix this issue so I can get back
to groupcompress benchmarking.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090205/969feac5/attachment.pgp