[MERGE] Add get_data_stream/insert_data_stream to KnitVersionedFile

Fri Aug 3 05:03:09 BST 2007

On Fri, 2007-08-03 at 13:38 +1000, Andrew Bennetts wrote:
!tweak

> This bundle adds some new methods to KnitVersionedFile:
> 
>   * get_data_stream(versions): returns all the data about those versions from
>     this knit, as directly as possible.  So it will return them in the order
>     they are read from the file, and does very little processing beyond just
>     giving back the bytes straight off disk.

I like this.

>   * insert_data_stream(stream): inserts a data stream from get_data_stream into
>     this knit.

And this.

>   * get_stream_as_bytes(versions): like get_data_stream, but bencodes the stream
>     to bytes so that it is suitable for sending directly over e.g. the smart
>     server protocol.

I'm not sure about this - specifically, why is this on Knit, but appears
to have no matching insert_stream_from_bytes? Also, for pack
repositories I think I'll need to move some of this elsewhere to be able
to use it efficiently.

See the end of the review for some discussion; its up to you whether you
want to do something about this now, or have me shuffle the code as soon
as you land it :).

The code looks good, though it will collide with my merge for support
_PackAccess - trivially from the look of it.

+        :param required_versions: the exact set of versions to be
returned, i.e.
+            not a transitive closure.
+        

This docstring is perhaps a little confusing. I suggest. 
    :param required_versions: The exact set of versions to be extracted.
        Unlike some other knit methods, this is not used to generate
        a transitive closure, rather it is used precisely as given.

Discussion about future work.

In the pack repository I have currently a 'knit' which is actually
layered onto the packs and pack indices. So we have for a given knit -
e.g. inventory.knit:
KnitVersionedFile
  | \-_KnitGraphIndex
  |      \-CombinedGraphIndex
  |           +-GraphIndex
  |           +-GraphIndex
  |           +-...
  + _PackAccess
       +-(transport, '0.pack')
       +-(transport, '1.pack')
       +-...

Now to implement repository.get_data_stream, what I actually want to do
is to:
 - generate a list of revisions
 - read all the inventory deltas to generate the fileid:revision lists
 - read *all the fileid:revisions* present in pack 0
 - read *all the fileid:revisions* present in pack 1
 - ...
 - pull out the signatures likewise

So how is this different to what you've got so far - as far as I can
tell I may be interleaving data from different knits, but thats likely
the only difference.

The key point though is that while it probably makes sense for the
extraction of data into a data stream to be coded specifically for the
pack layout, the binary encoding of the knit data into the data stream
at the repository level should be the same, and as I won't have a Knit
object involved, having the binary encoding on the Knit class is useless
for me.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070803/827574ae/attachment.pgp