Another braindump on that intermediate layer API

Wed Aug 30 15:23:54 BST 2006

Vincent LADEUIL wrote:
> I was just thinking about your remark about the append being
> raped by another writer leaving corrupted data in the knit file.
> 
> Isn't a knit chunk signed ? In that case when reading the knit
> chunk you can detect the corruption and at least reduce its
> impact.

The actual data has a sha1 sum attached to it. (In the .knit file).
We also have start and end markers for every hunk (for both .knit and
.kndx) so we can tell if we have a complete hunk or not. And we already
ignore parts of the file which are not complete.

The bigger problem is that because of our layering, where we write out a
lower level, which must be complete before we write the higher level...

Because we do that, higher levels assume that lower level things are
available. So if there is a valid entry in the .kndx file, we assume
there is valid data in the .knit file. And more importantly, if there is
valid data in the inventory.knit we assume there is valid data in all of
the text.knit files that it references.

So we don't care if there is a little bit of corruption in the middle.

What we care about is that process A writes to the file, and sees all of
its writes succeed. And thus assumes that it can write a valid inventory
record, and a valid revision record.

Process B then goes about and *overwrites* some of the data that A
wrote. Now we have a corrupt repository.

We avoid that by using 'append', which always puts stuff on at the end,
and won't overwrite earlier stuff.

The only way I can think of that we could get corruption right now, is
if the append actions end up interleaved. So both A and B are writing to
the same file at the same time, and you get a little bit of A and a
little bit of B. If neither one gets an error, they will continue on
happily, oblivious to the fact that their earlier writes have failed.

> 
> That leads me to:
> 
> I guess you may have already implemented what I'm thinking about
> (I didn't have the time to look at that yet), but higher levels
> of the application should not have to worry about transport
> implementation, only of transport functionalities, and even that
> may be brought by intermediate objects.
> 
> Let's say that a KnitFile is such an object, depending on the
> transport used it can be read-only, writable or even becomes
> corrupted. The KnitFile implementation (and its implementation
> only) have to care about read-only transports or any other hints
> transports may need.
> 
> And KnitFile may not even be used by higher levels which will use
> object like VersionedFile or some such.
> 
> So KnitFile is an example of what I call the intermediate layer
> API.
> 
> May be I'm just re-discovering what you have already done, but I
> tend to work that way when discovering new projects.
> 
> So if that's just noise for you feel free to ignore it, 
> 
>      Vincent

I'm pretty sure we already do most of what you are thinking of. Some of
what you want to do still needs even higher level information. Stuff
that knows about a whole repository, and other stuff that knows about
knits in multiple repositories.

Rerouting around not having APPEND would need to happen in something
like Knit.InterKnit.join(), which is the only place that knows how to
combine two knits (their data and indexes).

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060830/ef60c715/attachment.pgp