RFC: streaming knit repository fetch interface

Robert Collins robertc at robertcollins.net
Fri Jun 8 03:18:54 BST 2007


As part of Andrew's work on the smart server it would be great to beat
the latency demon in the short term. The following sketch is for an
interface to go on Repository to allow structured retrieval of all the
data needed to perform a fetch, given that you know the revisions to be
included.

I thought the easiest way to show the interface would be prototypes:

def _get_revision_knit_data_stream(revision_ids_to_include):
    """Obtain an iterator over revision data from this repository.

    :param revision_ids_to_include: The revision ids to be gather up.
        For each revision id, the data stream will include:
         - File texts (as a blob of knit gzip hunks)
         - File graph data (as a blob of compressed text)
         - Inventory texts (as a blob of knit gzip hunks)
         - Inventory graph data (as a block of compressed text)
         - Signature texts (as a series of knit gzip hunks)
         - Revision texts (as a series of knit gzip hunks)
    :return: An iterator. Each item yielt by the iterator will be a 
        tuple of (names, callable_to_read_data). The first item will
        contain the necessary metadata for a reader to determine whether
        it can use the data stream or not - a format marker from this 
        repository. (I will make this more precise when coding it).
        For the file and inventory items in the data stream the binary
        blob is the gzip hunks run together. The matching graph data
        is a series of lines describing the blob - providing the readv
        offsets to obtain the individual hunks, and the matching graph
        data for the knit.
    """

def _insert_from_knit_data_stream(data_stream_iterator):
    """Insert data from a knit data stream into this repository.

    :param data_stream_iterator: A data stream iterator such as is returned
        by _get_revision_knit_data_stream.
    :raises IncompatibleModel: When the source data stream is from a
        subtree supporting repository, and this repository does not  
        support subtrees, the data cannot be inserted into the
        repository.
    """

This should match the initial API for writing to and reading from bzr
containers that Andrew is working on quite well, and can be implemented
in parallel. 

I'd like to get consensus on this approach before coding starts :).

Rob


-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070608/5663da3b/attachment.pgp 


More information about the bazaar mailing list