Append-only weavefiles

John Arbash Meinel john at arbash-meinel.com
Sun Sep 25 04:26:30 BST 2005


Well, I decided to give it a shot, and developed an append-only weave
file format.

It is available from:
http://bzr.arbash-meinel.com/branches/bzr/append-weave/

The format is reasonably straightforward, though probably more arcane
that the current weave format.

Basically, it consists of text lines, and revision information lines. A
text line looks like:
t <line num> <line length> <text>\n
All text lines are terminated with a newline (to allow f.readline() to
work). The <line length> indicates how much text there should be (it may
or may not include the newline).
I was originally thinking to do length-prefixed strings, but that will
wait for a binary format (the current format is just ascii).

These text lines basically create a corpus of lines which can be referenced.
For revision information you have:

r <version num> <version name>
p <parents>
1 <sha1 hash>
o <operations>
c <checksum>

version num is just the local index number. They are counted from 0
(same as texts), but the number is there to ensure consistency.
name is as expected, parents is the old "includes".

Checksum is just a sha hash of the revision information. My idea was
that since you are appending to a file, you might fail in the middle. By
having a checksum you can tell it wasn't complete, or if it was
corrupted. (Probably just having a single character as the last entry
would work to, and it would take up less space.)

The magic is in the "operations" line. Basically, it just encodes what
operations are necessary to recreate the weave. It is able to recreate
the in-memory weave (by reading the entire file).

The offsets are based on the in-memory weave. I thought it would be nice
to do them based solely on the previous ancestry. Which might mean that
you didn't have to read the entire file.
I also don't know if there are better in-memory representations, but
since Martin had already done all the work of making this one work, I
just stuck with it.

It passes all tests, and I think it does a decent job.

It takes more room than the current version, mostly because of an extra
sha hash per revision (which is far more than we need, but was easy to
implement.)

Even if you don't really prefer the actual file format, you might want
to pull some of the other pieces. For instance, I created a factory
implementation for reading and writing weave files. So if we upgrade the
format, it should be reasonably easy to support both at the same time.
(as long as the formats contain reasonably enough information).

Oh, and right now it isn't really possible to take a V5 weave and
upgrade it to my V6 format. Since you need to know the insert/delete
operations for each version. It would be possible to do so, I just
didn't work out the details.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050924/8cfcb8f5/attachment.pgp 


More information about the bazaar mailing list