[RFC] VersionedFiles.get_data_stream/insert_data_stream and factory objects etc

Robert Collins robertc at robertcollins.net
Wed Apr 9 05:20:02 BST 2008

So I'm thinking about the best way to keep generalising
get_data_stream/insert_data_stream down at the VersionedFile(s) level.

For a Weave the idea data stream for a range of versions is the weave
itself preprended with the versions that are being logically selected.

For a knit, its the individual knit records.

During insertion, a knit wants to be able to pull out knit records, and
a weave wants to be able to pull out fulltexts.

So I'm thinking of something along the following lines:

Add a field type to represent a full text *uncompressed* record. These
would be handled during insertion the same as insertion during commit
(e.g. with possibly conversion to deltas etc).

get_data_stream on a weave returning a signature of 'weave'.

Having a base insert_data_stream which does:
  if format != self.get_format_signature():
    format, stream_data = adapt_stream(self, \
        self.get_format_signature, format, stream_data)
  return self._insert_data_stream(format, stream_data)

And writing the following adapters:
 - weave to any knit by yielding everything as a full
 - any knit to weave by using the weave as a basis file and using the
   StreamIndex and StreamAccess facilities but generalised to layer on
   VersionedFiles not on knits specifically, and yielding full texts
 - unannotated knit to annotated knit by using the target as the basis
   to reconstruct the unannotated full text and yielding that as a
   full text.
 - annotated knit to unannotated knit by stripping annotations from the
   hunks and outputting compressed records.


