Mutating history in Subversion and Bazaar

Aaron Bentley aaron.bentley at utoronto.ca
Thu Aug 31 17:05:30 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Allouche wrote:
> Aaron Bentley wrote:
> 
>>In order for it to be corrupt here, it needs to have a copy of C whose
>>parent is B0.

>>Having the old C would prevent the new C from being installed, so you'd
>>be stuck with the original history.  We don't apply deltas to produce
>>trees, so the storage wouldn't be corrupted in the way you're thinking.
>> In a way, it's worse than Arch, because you may *never* find out that
>>there are two different versions of C running around.
> 
> 
> Since knits are delta-compressed and usable in append-only mode, you
> still have to apply deltas to extract user data, don't you?

Knits are not part of the model.  The model is full-tree snapshots, and
knits are mainly an implementation detail.

This means that if you use a dumb fetcher that works at the model level,
what I said above holds true.

But if you use a clever fetcher that works by slinging knit deltas
around, then yes, it's conveivable to corrupt the knit.  Knits store
sha1 hashes, so the corruption would be easy to detect.

I don't know whether we check sha-1's when copying deltas from one knit
to another.  We could do that, or we could make sure that the sha-1 of
the parent in the target knit matches the sha-1 of the parent in the
source knit.  So for knits, the knit itself contains enough data to
verify that you're not creating a version that cannot be constructed.

> I guess that what you mean is that since knits deltas are not reversible
> (I guess), then changes in context are not a cause for failing to build
> the document text.

I didn't mean that it was due to the knit delta type, I meant that knits
weren't part of the model, so at a high level, it couldn't happen.

>>It's not B0 that's the problem.  The meaning of B0 was never altered.
>>The problem is that C's value has been changed, but the two Cs are
>>indistinguishable.
> 
> 
> That's a very interesting point. If I understand correctly, the problem
>  here is the installation of C based on B0 while C is defined relative
> to B1.
> 
> Wouldn't it be possible to catch such violations at fetch time? Maybe
> using some logic like:

[snip]

> This logic is probably flawed in many ways, because I do not understand
> the knits storage model well, but I hope it helps convey my point.

I think it sounds pretty good.  Unfortunately, to be really sure there
are no discrepancies between two repositories, you have to compare every
common revision, because the discrepancy may be some time long in the past.

> Violating the integrity of a distributed database is certainly not a
> nice thing to do, but I hope that we can find a way to control the
> splash damage enough to make transparent interoperability with other
> systems a reliable proposition.

I think we can prevent splash damage.  I'm not sure what we do when
we've discovered that a revision's data is inconsistent.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFE9wjK0F+nu1YWqI0RAtuCAJ91miE/Wf6l4i0o3O+GF5xzVB6FygCcCAyk
DRXdMayXp6TW9RRJ3ODxDsU=
=nW6y
-----END PGP SIGNATURE-----




More information about the bazaar mailing list