Repository.insert_data_stream contract [Re: [MERGE] Packs. Kthxbye.]

Robert Collins robertc at robertcollins.net
Fri Oct 19 06:35:57 BST 2007


On Fri, 2007-10-19 at 15:03 +1000, Andrew Bennetts wrote:
> Hi all,
> 
> Robert Collins wrote:
> > On Wed, 2007-10-17 at 18:39 -0400, John Arbash Meinel wrote:
> > >       def insert_data_stream(self, stream):
> > > +        """XXX What does this really do?
> > > +
> > > +        Is it a substitute for fetch?
> > > +        Should it manage its own write group ?
> > > +        """
> > > 
> > > ^- This is part of Andrew's new work to stream data across using the 
> > > smart server.
> > > So, AFAICT it is indeed a substitute for fetch. You ask the source for a 
> > > stream,
> > > and hand that off to the target.
> > 
> > This was a hint to get Andrew to write a docstring :).
> 
> I sat down to write one, and found that I'm not really sure exactly what the
> contract for this method should be.  Here's an intentionally vague docstring for
> it that avoids the hard questions:

Some specific questions:
 - is it keyed to a RepositoryFormat? is it able to be used between
arbitrary repositories?
 - At the serialisation layer, is it a fixed format, or are we expecting
to use magic to determine whats in there?
 - At the semantic layer, What about Knit1 to Knit3? Knit to Pack? Knit3
to Knit1?

> That's pretty unsatisfactory, though I suppose it's better than nothing.
> 
> This doesn't say anything concrete about what “bytes” contains.  At the moment
> the implementation assumes it is a bencoded list of knit records, that is a list
> whose elements are tuples of (version, options, parents, knit_bytes).  This is
> convenient for stuffing into a knit versionedfile (via
> VersionedFile.insert_data_stream).  I'm not sure how convenient this is for
> other formats.  Robert, does this seem a reasonable interface for a stream
> between packs?

An easier format for a stream between packs is probably the bytes for
the pack, and each index. So 5 byte strings. Plus the expected hash of
the pack. That avoids the knit layers altogether, and the requirement to
do knit-at-a-time.

> To write a good docstring for this, we need to figure out if it can (and should)
> work with a stream from a repository in a different format.  I guess some
> streams have to be incompatible, e.g. a repository supporting tree roots can have
> changes that cannot be represented in a repository that doesn't support tree
> roots.

Agreed. And we should have tests for this. I also noticed one test in
tests.test_repository that this method should error in some
circumstances, and I think that that should be an interface test.

> So I think perhaps this docstring should say that the encoding is
> format-specific?  e.g.
> 
>         """Insert revisions from a data stream.
> 
>         A stream is an iterable producing (item-key, bytes) pairs.  The bytes
>         should be serialised records to add to the store named by the item key.
>         The serialisation format depends on the repository format, so a stream
>         should only be inserted into a Repository in the same format as the
>         Repository the stream was generated from.
> 
>         Typical use is like::
>         
>             stream = src_repo.get_data_stream(revision_ids)
>             dest_repo.insert_data_stream(stream)
> 
>         :seealso: get_data_stream
>         """
> 
> What do people think?

I think thats a massive improvement over the current docstring. What
about error conditions? Like, you say same format, and should, but what
happens if you get it wrong - and because this is likely to matter, I
think we should define it - e.g. 'a mismatched format will cause an
error and not modify the repository at all'.

We can follow up a more clear definition of it without expanding its
scope to cross-format situations. Once we have a good solid definition
of it today merged into bzr.dev, then I'd like to move onto discussing
cross-format scenarios.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071019/cfb61b4b/attachment.pgp 


More information about the bazaar mailing list