Repository.insert_data_stream contract [Re: [MERGE] Packs. Kthxbye.]

Andrew Bennetts andrew at canonical.com
Fri Oct 19 07:22:58 BST 2007


Robert Collins wrote:
> On Fri, 2007-10-19 at 15:03 +1000, Andrew Bennetts wrote:
[...]
> > 
> > I sat down to write one, and found that I'm not really sure exactly what the
> > contract for this method should be.  Here's an intentionally vague docstring for
> > it that avoids the hard questions:
> 
> Some specific questions:
>  - is it keyed to a RepositoryFormat? is it able to be used between
> arbitrary repositories?

Looks like the answer is that it is tied to a RepositoryFormat.

>  - At the serialisation layer, is it a fixed format, or are we expecting
> to use magic to determine whats in there?

Ditto.

>  - At the semantic layer, What about Knit1 to Knit3? Knit to Pack? Knit3
> to Knit1?

Good question.  I don't know the answer to this yet.

> > That's pretty unsatisfactory, though I suppose it's better than nothing.
> > 
> > This doesn't say anything concrete about what “bytes” contains.  At the moment
> > the implementation assumes it is a bencoded list of knit records, that is a list
> > whose elements are tuples of (version, options, parents, knit_bytes).  This is
> > convenient for stuffing into a knit versionedfile (via
> > VersionedFile.insert_data_stream).  I'm not sure how convenient this is for
> > other formats.  Robert, does this seem a reasonable interface for a stream
> > between packs?
> 
> An easier format for a stream between packs is probably the bytes for
> the pack, and each index. So 5 byte strings. Plus the expected hash of
> the pack. That avoids the knit layers altogether, and the requirement to
> do knit-at-a-time.

Ok, so a knit data stream isn't convenient for packs.  So let's not standardise
the existing stream format as One True Format.

> > To write a good docstring for this, we need to figure out if it can (and should)
> > work with a stream from a repository in a different format.  I guess some
> > streams have to be incompatible, e.g. a repository supporting tree roots can have
> > changes that cannot be represented in a repository that doesn't support tree
> > roots.
> 
> Agreed. And we should have tests for this. I also noticed one test in
> tests.test_repository that this method should error in some
> circumstances, and I think that that should be an interface test.

Agreed.

> > So I think perhaps this docstring should say that the encoding is
> > format-specific?  e.g.
> > 
[...]
> 
> I think thats a massive improvement over the current docstring. What
> about error conditions? Like, you say same format, and should, but what
> happens if you get it wrong - and because this is likely to matter, I
> think we should define it - e.g. 'a mismatched format will cause an
> error and not modify the repository at all'.

Agreed.  I'll work on details and tests for this.  The current stream doesn't
have an explicit version/format marker, and this is looking like a bit of an
issue.  Repository.insert_data_stream hasn't been in a release yet, so we can
still change this without breaking much, so I'll do that.

> We can follow up a more clear definition of it without expanding its
> scope to cross-format situations. Once we have a good solid definition
> of it today merged into bzr.dev, then I'd like to move onto discussing
> cross-format scenarios.

Sounds good.

I think perhaps I should also write “I will write docstrings” on a blackboard
100 times until I learn my lesson ;)

Thanks,

-Andrew.





More information about the bazaar mailing list