CHK nodes and check

Tue Jun 16 03:46:52 BST 2009

On Mon, 2009-06-15 at 21:39 -0500, John Arbash Meinel wrote:
> I'm pretty sure check needs to verify things from the top-down anyway,
> so I don't quite see what check gains by knowing whether a node is a
> parent-basename or fileid-entry just from the raw bytes.
> 
> If you have something specific, I'd like to hear it. It sounds like you
> might have something and I'm just missing it.

I'm reevaluating how I'm doing check anyhow; currently 50% faster than
trunk for bzr.dev itself in 1.9 format.

> Why aren't you walking top down, tracking what references are present,
> and what is missing? Certainly you can then seem what nodes are extra
> that aren't referenced.
> 
> Perhaps you are trying to do check in disk-sorted IO. If you batch it
> well enough, it should do pretty well regardless.

I'm trying to avoid reading twice; I extended VF.check() as per my RFC a
few days ago, so it can act as a get_record_stream generator, but its
always unsorted (shouldn't need batching at all, if it does
get_record_stream is buggy :P). Yes, I can track what I'm expecting a
node to be; but it is nice to have things be self describing such that
when we read it we can interpret it.

For instance, extracting file version references from a stream of CHK
pages can't be done at the moment, unless you know which are which sort.
I think this is a design flaw making validation in e.g. fetch harder.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090616/0a3d8c38/attachment-0001.pgp