[MERGE] Developer doc: container format

Thu Jun 7 16:51:30 BST 2007

Vincent Ladeuil wrote:
[...]
> I agree with Aaron, if the lowest layer can't decode the content
> with marker kind only, it will not be able to detect data
> corruption (detecting other data corruptions can still occur at
> higher levels but that's a separate issue).
> 
> Length-prefixed is good. TLV: type, length, value is the basis of
> most of the reliable formats I know of.

After discussion with Martin and Robert, I've updated the doc along these lines.

Actually, because we only have one type of record (other than the inherently
special End Marker record), you could consider the length to be part of the
record (i.e. you need to partially parse the record to read it), or part of the
container layer, just like the record kind.  I think leaving this ambiguity is a
good thing at this stage, until we have a clearer idea of how we want to use
this format.

[...]
> 
> I understand the constraint: you don't want to force the producer
> to know the size of the record before writing it.

That would be nice, but I'd rather worry about this later.  The things we want
to use this for initially (pieces of knits) already know the size of the bytes
to pack.

I think we will want to deal with this better eventually, but also think there's
no advantage to getting ahead of ourselves with this.  When we have a clear use
for streaming unknown length data, then we can use that use case to drive the
design to support that.  I'm a big fan of keeping things simple where possible.

-Andrew.