Repository.insert_data_stream contract [Re: [MERGE] Packs. Kthxbye.]

Andrew Bennetts andrew at canonical.com
Fri Oct 19 06:03:14 BST 2007


Hi all,

Robert Collins wrote:
> On Wed, 2007-10-17 at 18:39 -0400, John Arbash Meinel wrote:
> >       def insert_data_stream(self, stream):
> > +        """XXX What does this really do?
> > +
> > +        Is it a substitute for fetch?
> > +        Should it manage its own write group ?
> > +        """
> > 
> > ^- This is part of Andrew's new work to stream data across using the 
> > smart server.
> > So, AFAICT it is indeed a substitute for fetch. You ask the source for a 
> > stream,
> > and hand that off to the target.
> 
> This was a hint to get Andrew to write a docstring :).

I sat down to write one, and found that I'm not really sure exactly what the
contract for this method should be.  Here's an intentionally vague docstring for
it that avoids the hard questions:

    def insert_data_stream(self, stream):
        """Insert revisions from a data stream.

        A stream is an iterable producing (item-key, bytes) pairs.  The bytes
        should be serialised records to add to the store named by the item key,
        e.g. bencoded knit records.

        Typical use is like::
        
            stream = src_repo.get_data_stream(revision_ids)
            dest_repo.insert_data_stream(stream)

        :seealso: get_data_stream
        """

That's pretty unsatisfactory, though I suppose it's better than nothing.

This doesn't say anything concrete about what “bytes” contains.  At the moment
the implementation assumes it is a bencoded list of knit records, that is a list
whose elements are tuples of (version, options, parents, knit_bytes).  This is
convenient for stuffing into a knit versionedfile (via
VersionedFile.insert_data_stream).  I'm not sure how convenient this is for
other formats.  Robert, does this seem a reasonable interface for a stream
between packs?

To write a good docstring for this, we need to figure out if it can (and should)
work with a stream from a repository in a different format.  I guess some
streams have to be incompatible, e.g. a repository supporting tree roots can have
changes that cannot be represented in a repository that doesn't support tree
roots.

So I think perhaps this docstring should say that the encoding is
format-specific?  e.g.

        """Insert revisions from a data stream.

        A stream is an iterable producing (item-key, bytes) pairs.  The bytes
        should be serialised records to add to the store named by the item key.
        The serialisation format depends on the repository format, so a stream
        should only be inserted into a Repository in the same format as the
        Repository the stream was generated from.

        Typical use is like::
        
            stream = src_repo.get_data_stream(revision_ids)
            dest_repo.insert_data_stream(stream)

        :seealso: get_data_stream
        """

What do people think?

-Andrew.




More information about the bazaar mailing list