check performance and API's

Robert Collins robert.collins at
Wed Jun 3 06:40:29 BST 2009

On Wed, 2009-06-03 at 15:10 +1000, Andrew Bennetts wrote:
> Robert Collins wrote:
> [...]
> > I'm thinking that changing the vf check api to take an optional set of
> > keys to check, and a checker object to provide check results to.
> > 
> > If not provided, all keys would be checked. Check would then return the
> > same type of result as get_record_stream. The difference would be that
> > check would [optionally depending on the vf type] do extra consistency
> > checks.
> > 
> > This seems to me to permit checking of physical storage we don't look at
> > on every read operation, extracting every text so we can check the
> > chained sha1s downwards and avoiding duplicate work.
> That sounds reasonable to me.
> This is probably a bit of a tangent, but...

Not really :)

> One question that occurs to me is what about keys that depend on other keys?
> Well “depends” is a pretty loose term...
> When key X has a compression parent of key Y then it's fairly clear that Y
> needs to be checked before you can declare that X is ok.  You could do this
> in the VF check method by just extending the keys-to-check value that was
> originally passed as an arg.

So I don't think that your asserton that Y must be checked is valid.
Lets take knits as an example. Imagine that we don't validate the sha1
of transient texts. X could have a sha1 mismatch meaning it cannot be
reconstructed, while still allowing Y to be reconstructed with the sha1
the knit header includes.

> A more complicated case is e.g. a revision object, which requires a
> corresponding inventory object, which in turn (probably) requires some text
> objects.  So the key dependencies here cross VF boundaries, and the
> dependencies are at a higher semantic layer than the raw revision-text
> storage.  Maybe these checks are at a different level to the ones you are
> asking about in this email, though?

These checks are at a higher level - in fact the level I'm working on.
I'm generating a queue of texts, sha1s, sizes[where available] that I
need to verify; then I want to get those texts to [parse for chk data,
check sha1, check size], but I don't want to duplicate effort by later
calling check() on the various VF's and having that reconstitute all the
texts I just used. (Which is one of the ways that check is currently

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : 

More information about the bazaar mailing list