[MERGE] Prevent knit corruption by checking parents

Aaron Bentley aaron.bentley at utoronto.ca
Fri Sep 22 16:28:04 BST 2006

Hash: SHA1

John Arbash Meinel wrote:
> Aaron Bentley wrote:
> I'm a little concerned about the performance implications of this
> change, because we do have to extract the parent contents in order to
> find the sha1 sum.

I agree.  I would like the sha1s to go in the knit index, so that
maintaining integrity isn't expensive.

> However, it does look like you are restricting the check to a small set,
> which seems like the best that we can do. 

Technically, we only need to check the leftmost parent, since the delta
is only against that.

> (We don't actually have to
> extract the entire gzip hunk, just the first line, but we probably have
> to read the whole chunk, which is the real performance penalty over a
> remote connection).

This will really suck if the parent happens to be a fulltext.  I don't
really know what to do here, because if we read less, we risk causing
even more readvs.

> 'bzr push' is already pretty slow, and now, on top of pushing data, we
> have to read back data, and verify the sha1 sums.

Yeah.  I don't see an alternative, though.

> So, locally, it is costing us 20-40ms to do this. Which I am okay with.
> However, over the remote connection, it is costing 540ms. Which equates
> to something like 18 round trips (at the 30ms latency of SocketDelay). I
> assume that it has something to do with the number of files affected, etc.

Of course, the check is more necessary when dealing with remote

> So I like the integrity checking, but I'm concerned about how much
> overhead it generates. Because of that, I'm not comfortable giving it
> the green light without having more people look over it.

I understand your caution.  It really disturbs me that using the
standard API correctly can allow knits to become corrupt, though.

Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org


More information about the bazaar mailing list