[MERGE-REQ] bzr.newformat

Martin Pool martinpool at gmail.com
Fri Sep 16 02:58:05 BST 2005


On 16/09/05, Gustavo Niemeyer <gustavo at niemeyer.net> wrote:
> Greetings,
> 
> I've checked out the weave branch, and had a general overview
> on it.  The new storage scheme looks very promising indeed.
> One point I've found a bit dubious though is how the storage
> format depends on newlines as an enforced boundary for each
> chunk. This would turn the format into something a bit unwieldy
> for binary files, becoming impossible, for instance, to have
> chunks of fixed sizes.

For text files obviously you do want to split it on newlines, as the
sensible unit for annotation and merging.

The weave code just works on a sequence of strings, without any
requirement that they be terminated by newlines or even printable.  It
should work fine with binary files (though it needs more tests).  You
do need some way to chunk the binary file; \n will work ok on many
files but might give uneven chunk sizes on some.  Better would
probably be to use a rolling checksum.

The current storage format is also line oriented, but uses ',' to mark
data lines with no trailing newline present.  This seems to work OK to
store binaries.  Doing it this way seemed reasonably efficient in
Python, and has the advantage of making the weaves more human
readable.

> - For some reason files starting with '__' are not accepted
>   by _add_text_to_weave(). I was testing the new code with
>   a Python source tree, and this created problems with
>   __init__.py. The code preventing this kind of entry was
>   just commented out to make tests work.

This was just me being too lazy in storing the control files inside
the weave store.  I should do something else to make them not clash
with file ids.

Anyhow, thanks, I'll have a look.
-- 
Martin




More information about the bazaar mailing list