[RFC] Blob format

Johan Rydberg jrydberg at gnu.org
Tue Jun 27 21:03:42 BST 2006


As some of you know already I've been toying with the idea of a blob
history format.  I wrote down a few lines to start a discussion:

  http://bazaar-vcs.org/BzrBlobFormat

There is also a branch available where I've done some work (see the
spec for URL.)

The golden goal of the blob format is to eliminate a lot of RTT's to a
dumb remote server, and try to get rid of the problem with identifying
what to fetch.

My current work is focusing on having a model where a blob can contain
several revisions.  Blobs are immutable, but can be combined into a
new blob.  This is done while doing a push/pull/merge.  Meaning the
number of blobs will be reduces when data is shared between repos.

The blob also contains meta-data about what changes a revision
introduces to a file, to speed up operation such as 'bzr log FILE'
(i.e., Repository.get_revision_delta)

Currently I use simple .zip-files as containers for the blobs. That
may be far from optimal though, since I got the feeling that you need
to skip a lot to read the contents from a zipfile.  So a custom made
container may be more suitable.

Another open question is how to store information in the blob store
about what blobs contains what revisions.  One simple way would be to
have an index of revision->blob mappings.  That index could also
contain parent information for the revisions, to speed up repository
graph and ancestry retrieval.  The downside is that when inserting a
blob in the blob-store it would have to be inspected to figure out
what revisions are present.  That data could possible be deduced from
the fetching process, though,

~j





More information about the bazaar mailing list