[RFC] Blob format

Matthieu Moy Matthieu.Moy at imag.fr
Wed Jun 28 09:55:41 BST 2006


Johan Rydberg <jrydberg at gnu.org> writes:

> If it turns out to be something as fast as knits, then it could very
> well replace them.  But it is more of a research format. 

It can also be a complement to knits. I think git has something
similar. For example, you could keep the knit archive, and
periodically add a big "bundle" or "blob" in addition to this. This
way, branch would do

1) get the big bundle (takes bandwidth, but amost no rtt)

2) See which revisions are still missing

3) fetch them in the usual way for knits.

> IIRC, GNU Arch sad nothing like skip-deltas, and the format was based
> on patches.  So to build a tree, it had to apply N patches, where N is
> the number of changes to the tree.

It has been discussed several times on the mailing list, but AFAIK,
nothing has ever been implemented.

> What worries me is that a blob format will become a skip-fest.  Say
> that you want to extract the latest versions of files 'foo' and 'bar'.
> Starting with 'foo', you open the blobs that contains the deltas for
> the file; e.g., blobs 'F', 'C', 'A'.  To extract 'bar' you may need go
> through blobs 'G', 'D', 'B', 'A'.  There will be no streaming access
> pattern, which is bad.  Knits are far better in that respect.

With appropriate indexing, it might be possible to get only the
portions of the files concerning files foo and bar (but to get N
files, with N neither small nor close to the number of files in the
project, you'd probably anyway have to chose between rtt and
bandwidth).

-- 
Matthieu




More information about the bazaar mailing list