[RFC]: preparing for Blob storage

Robert Collins robertc at robertcollins.net
Sat May 27 00:15:20 BST 2006


We've been talking on and off about real write-only storage, with
various constraints and performance considerations. In particular there
is some hope we can achieve both the nice performance properties
[lock-free, rsyncable *into* a shared repository] of a WORM approach
using 'upload a single directory to add a revision', and the good space
efficiency achievable by grouping things...

Martin and I spent yesterday doing some prep work and analysis aimed at
making it *possible* to experiment with blob based storage.

The motivating factor is to allow the future introduction of a storage
format that meets all our desired criteria, by having the constraints on
Repository's API be amenable to blob based WORM storage:

 - revision listing is not possible in general, so the repository should
only expose graph-traversal apis. I.e. no all_revision_ids,
get_revision_graph requires a set of entry-points into the graph.
   Naturally, the current repositories will still use their internal
_all_revision_ids and graph facilities - performance will not be
impaired. But the constraint on the repository *interface* will allow a
blob interface to be used. Operations like 'check' and 'reconcile' and
'upgrade' can be modified to work with this constraint - primarily by
delegating the decision of the actual code to run to the repository,
allowing for per-repository routines of each of the above.
 - writing to the repository should require building up a commit during
a write transaction and then having that be finalised into place.
   As with the first point, current repositories will do exactly what
they do now - the ordered writing of data, no actual 'commit' to occur,
and lock the whole repository during this process - will be unchanged.
Placing this constraint on the Repository interface (that to add data,
you must get a RevisionBuilder from a repository, and then call methods
on that to add file texts, add inventory data, revision data, signature
etc) is another ingredient to allow writing a blob backend.

There may be other conceptual changes required. What we are trying to do
is to prepare the code base now, for this in the future, with the
following guiding principals:
 * we dont want any performance degradation
 * we're not worried about getting it absolutely right, just close
enough that a blob repository plugin is possible.

This email is essentially a heads up and solicitation for opinions on
this - is it reasonable to merge to the mainline as things are
completed, or should we treat it as a long lived feature branch?

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060527/5a658675/attachment.pgp 


More information about the bazaar mailing list