[RFC] Blob format
Johan Rydberg
jrydberg at gnu.org
Wed Jun 28 09:27:50 BST 2006
Matthieu Moy <Matthieu.Moy at imag.fr> writes:
> Johan Rydberg <jrydberg at gnu.org> writes:
>
>> The golden goal of the blob format is to eliminate a lot of RTT's to a
>> dumb remote server, and try to get rid of the problem with identifying
>> what to fetch.
>
> But won't this make local operation much slower ?
That I do not know yet, but it may very well be so.
> Is this "blob" supposed to come in addition to the, say, knit format,
> or as a new format replacing it if it's adoted ?
>
> Indeed, this reminds me the GNU Arch format, which was one of the
> bottlenecks for performance (but was extremely bandwidth-saving for
> remote operations).
If it turns out to be something as fast as knits, then it could very
well replace them. But it is more of a research format.
IIRC, GNU Arch sad nothing like skip-deltas, and the format was based
on patches. So to build a tree, it had to apply N patches, where N is
the number of changes to the tree.
What worries me is that a blob format will become a skip-fest. Say
that you want to extract the latest versions of files 'foo' and 'bar'.
Starting with 'foo', you open the blobs that contains the deltas for
the file; e.g., blobs 'F', 'C', 'A'. To extract 'bar' you may need go
through blobs 'G', 'D', 'B', 'A'. There will be no streaming access
pattern, which is bad. Knits are far better in that respect.
>> My current work is focusing on having a model where a blob can contain
>> several revisions. Blobs are immutable, but can be combined into a
>> new blob. This is done while doing a push/pull/merge. Meaning the
>> number of blobs will be reduces when data is shared between repos.
>
> This sounds indeed very similar to what bundles do. Plus, perhaps,
> some indexing, and for sure some compression.
Yes, it is very similar to bundles in that it groups several changes
into a single transmittable unit.
More information about the bazaar
mailing list