python weave code
Robert Collins
robertc at robertcollins.net
Wed Jun 29 04:26:28 BST 2005
On Wed, 2005-06-29 at 12:40 +1000, Martin Pool wrote:
> I can see a way to store weaves in an append-only form.
>
> People who've used arch tend to strongly appreciate its write-once
> format: with few exceptions, once files are written they are not
> updated. This makes mirroring and access over dumb protocols easier,
> and gives good assurance that a bug in arch or elsewhere won't destroy
> any existing data. On the other hand, it tends to slow down some
> operations, and can be seen as optimization for the uncommon case (of
> corruption.)
>
> The weave code could write out a series of forward deltas expressed in
> terms of additions to a weave rather than directly on a text. To
> recreate a text one would first play forward all of those updates to
> recreate the weave, and then unweave it. Possibly the two operations
> can be fused to run faster. This should give the robustness features of
> write-once, while still retaining the weave benefits of aiding merge and
> fast annotation. There are various options for storing the deltas: one
> after the other in one or two files (like revfile), or we could store a
> container holding all the weave updates for all files changed by a
> particular revision.
>
> I'm not saying this is the right way to go but it is interesting.
Theres an alternative way: split the weave into two files as per revfile
storage. For the data file, just record the lines that would be added to
a weave - a weave only adds new lines as variations come in, and
existing lines are immutable (but their selectors can change) - at the
end of the file.
i.e. if one had a weave roughly like
0-3
AAAAAAAA
2-3
CCCCCCCC
0-1
BBBBBBBB
1-2
EEEEEEEE
0-3
DDDDDDDD
where rev 0 is
AAAAAAA
BBBBBBB
DDDDDDD
rev 1 is
AAAAAAA
BBBBBBB
EEEEEEE
DDDDDDD
rev 2 is
AAAAAAA
CCCCCCC
EEEEEEE
DDDDDDD
and rev 3 is
AAAAAAA
CCCCCCC
DDDDDDD
we would store rev0 initially in the data file:
AAAAAAA
BBBBBBB
DDDDDDD
rev 1 adds EE:
AAAAAAA
BBBBBBB
DDDDDDD
EEEEEEE
rev 2 adds CC...
AAAAAAA
BBBBBBB
DDDDDDD
EEEEEEE
CCCCCCC
and rev 3 adds no texts.
then our selectors can be weaved together:
rev 0-3: lines 0-0
rev 2-3: lines 4-4
rev 0-1: lines 1-1
rev 1-2: lines 3-3
rev 0-3: lines 2-2
(lines could be specified by byte offsets in the actual selector file
(and could even be a byte based weave for binary files)). This has a
couple of nice properties - a single readv (or Gather request on
windows) can retrieve a single exact text for us. The data file is
append only but no deltas are needed.
For the selector storage, I don't have a better answer than storing
deltas and reconstructing - but the selectors should be a small fraction
of the data file, and reading them all at once in (say) an http pull
wouldn't be very expensive, and would let us construct http ranges
requests (the equivalent of readv) for remote access.
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050629/fe1151bb/attachment.pgp
More information about the bazaar
mailing list