[MERGE-REQ] bzr.newformat
Gustavo Niemeyer
gustavo at niemeyer.net
Fri Sep 16 09:56:35 BST 2005
> > I've checked out the weave branch, and had a general overview
> > on it. The new storage scheme looks very promising indeed.
> > One point I've found a bit dubious though is how the storage
> > format depends on newlines as an enforced boundary for each
> > chunk. This would turn the format into something a bit unwieldy
> > for binary files, becoming impossible, for instance, to have
> > chunks of fixed sizes.
>
> For text files obviously you do want to split it on newlines, as the
> sensible unit for annotation and merging.
Certainly.
> The weave code just works on a sequence of strings, without any
> requirement that they be terminated by newlines or even printable. It
> should work fine with binary files (though it needs more tests). You
> do need some way to chunk the binary file; \n will work ok on many
> files but might give uneven chunk sizes on some. Better would
> probably be to use a rolling checksum.
What worries me is that the chunk unit for the file format depends
on newlines. I can't just pass it a chunk (line) with an embeded
newline and hope it will work. Just to make the point clear, think
how the weave file would look like in the pathological case of a 1MB
file containing just newlines. With the proposed change, once we're
able to identify a binary file in bzr, we could at least split in
specific chunk sizes. Using a rolling checksum would certainly be
a plus.
> The current storage format is also line oriented, but uses ',' to mark
> data lines with no trailing newline present. This seems to work OK to
> store binaries. Doing it this way seemed reasonably efficient in
> Python, and has the advantage of making the weaves more human
> readable.
Yes, it looks really interesting.
Thanks for checking it out.
--
Gustavo Niemeyer
http://niemeyer.net
More information about the bazaar
mailing list