please check out weave-format branch

Martin Pool martinpool at gmail.com
Fri Sep 23 22:21:32 BST 2005


On 24/09/05, John A Meinel <john at arbash-meinel.com> wrote:
> > It's also possible to do append-only weaves.  Trivially, you can just
> > store the weave itself in a revfile.  I have also done some work on a
> > natively append-only weave format; you can see my work-so-far here:
>
> Well, I would guess that putting a weave in a revfile is a lot of extra
> overhead, since now you are diffing a weave to create the delta compression.

Right, I think Aaron just mentioned that to show that you can in
principle transform anything to be append-only, not that it's very
practical.

> I read over this, and I'm a little confused by the syntax he used. But I
> think I generally understand it. I wasn't trying to create a minimal
> read form, since the current weave form is read entirely. It would be
> nice to have a seekable form, but that requires a separate index to be
> generated. And if you want something like that, my form would let you
> put all the annotation data into the second file, and the first just
> becomes a pool of lines.

We were previously talking about trying for file formats where you
don't need to read the whole thing, but it turns out that reading an
entire weave is not too much of a burden, and there are several ways
in which we could optimize it.  (e.g. convert it to C, or have a
special function which just extracts a version without building the
whole thing into an object.)

So it is probably best to just focus on the robustness and upload
properties of append only.

> Well, I feel like the current weave format is more complicated than it
> has to be. the {} and [] syntax make it tricky to figure out what you
> are looking at.

So this is a bit subtle; there is some explanation in the comment but
I'll give more here.

> Especially since there is no nesting requirements so you
> can get:

That's actually intentional and (I think) necessary, not just me being
nasty. :-)

> { 2
> { 3
> . bla
> . bla
> [ 4
> . de bla
> } 3
> . something
> ] 4
> heya
> } 2

(I adjusted your example to make it well-formed by itself.)

OK, so it does look strange that the {3} and [4] blocks are not
properly nested, but overlapped.  Remember the {} markers represent
insertions and the [] represent deletions.  So the deletion that
occurred in version 4 was of one line "de bla" introduced in version 3
and one line "something" introduced in version 2.  The full texts are:

2:
something
heya

3:
bla
bla
de bla
something
heya

4:
bla
bla
heya

> And I also saw quite a few cases in the weaves where the closing
> bracket/brace didn't have a number.

This is because *insertions* always do nest properly; insertions
happen at a point and so never cross existing blocks.  We can omit the
closing number because it always matches the last-closed block.  (At
first I didn't do this, and perhaps its an overoptimization.)

> The problem I have with switching to the current weave format, is that
> it feels like I should be making snapshots, in case something gets
> messed up. (Think about svn with the berkley db backend, where you copy
> it periodically).
> But why am I using a SCM if I have to backup its meta-data periodically.
>  Coming from Arch, the SCM *was* the backup method. Naturally you should
> keep a mirror for redundancy, but mirroring a corrupted branch will give
> you a corrupted branch.

As you say, you need backups anyhow, especially with the history in
the working directory.  The question is whether corruption will
propagate to mess up your backups, necessitating having multiple
levels as we needed with svn.

The data is guarded by sha1 both at the inventory/revision level, and
also within the weave.  If you run the 'weave check' command it will
extract every version and make sure that its sha1 is what it should
be.  I am fairly confident that if any corruption does occur this will
trap it, and so if you check the branch before making the backup the
result will always be usable.

> Now maybe you feel like the current weavefile format is obvious enough
> that it is difficult to corrupt. From my experience of trying to look
> through the .weave file, I don't quite feel the same.

it's is a somewhat indirect format.  On the other hand I think that
one can at least partially understand it by looking at it, at least if
you start with files you're familiar with.  But maybe I'm biased.  One
reason why I did make it text and line based is to help with this.  An
append-only weave (good though it would be in other ways) might be
less obvious.

--
Martin




More information about the bazaar mailing list