api for adding revisions to versioned files

Wed Mar 1 02:49:45 GMT 2006

So we have this concept coming in for knits of a versioned file a
generalised weave.

The key elements are:
  that a versioned file stores versions of a file with metadata about
each version.
  the file as a whole can be asked to check its consistency
  the file can annotate the contents in a line based fashion
  new versions can be added
  existing versions can be retrieved.

Now, with knits we have two uses where we want to optimise highly:
transferring records from the matching knit in another repository, and
adding a text directly into a knit in a remote repository. 

The former seems clear to me - the join operation can transfer deltas
with minimal reprocessing as the diffs will apply perfectly (but the
annotation index lookups will not, so we need to adjust those using the
current format).

The latter seems to only need optimisation to me for rare operations
like reconcile which may reinsert the entire content of a knit and we
would not want that to be arbitrarily slow. Just a normal commit appends
at most one record per knit so there are no optimisation considerations
beyond only downloading the minimal data to construct the delta.

I don't think there is a best use pattern for writing that we can
trivially cache correctly. We can of course cache the whole knit if we
wanted to - using a caching write transaction if needed - but this can
be a substantial amount of data. For all the common operations I can
think of that involve more than one append to a single knit in a single
transaction (read 'reconcile' and 'fetch from a weave') there is a
single access pattern: append an entire graph of content one entry at a
time. So I propose that we add an api 'clear_cache()' on versionedfile
which will clear the internal cache of any versionedfile. For a Weave
this could remove the entire weave content from memory (and the next
call would reload the entire thing from disk) or could be a no-op... For
a Knit this would remove any and all cached information - drop the index
and any known data out of memory.

Remember that VersionedFile instances are only meant to exist within a
single Transaction anyway, so this is a memory footprint management
choice. (We had a previous optimisation discussion about having weave
cache the last extracted full text, and this is a near identical case to
that). In essence to let knits work well I now think that having a cache
within each versioned file is the right approach, and that the minimal
api for that is a way to clear the cache...

Further, the versioned file cached data that it needs may not be domain
objects - so the identity map approach is not appropriate here anyway. 

In this scheme things like 'weave.join(other_weave)' would be followed
by a 'weave.flush()' call if and only if we care about memory foot print
there.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060301/3b711fdc/attachment.pgp