Revfile vs Atomicity & Dumbfs

John A Meinel john at arbash-meinel.com
Mon May 9 23:01:53 BST 2005


I've been reading through the bzr docs, (I've gotten through the mailing
lists, and I'm working on the inside documentation).

Generally, I like what I'm seeing, but I'm concerned about 2 things,
both stemming from the same source.

Right now, I think you are just keeping a complete copy of each revision
of a file, which you obviously don't want to do over time. The current
suggestion is to use the "revfile" method, which has an append-only
index and an append-only text store.

The thing is, append-only isn't very transaction safe, it's certainly
better than write anywhere, but new-file only works better with backups,
and atomicity. And unless I'm mistaken, it is easier to add a new file
to a remote connection, than it is to append to an existing one (at
least with sftp/ftp, webdav may be different).

I was pretty concerned with the "bzr fix" command, which says that if
you get your tree borked, you run it to create a new directory that has
as much as it can save, rather than fixing in place.

Why not instead of having an append-only text-store, have a directory
where you insert new items. You already are working on trying to
optimize the patch deltas so that you don't have 50 deltas to get the
latest revision, and then you could compress the patch text in the
directory (I suppose you could do sequential compressed streams in one
file, though too).

This basically means having a directory per file, rather than a file per
file. I'm not sure how to handle the index file, though, as I don't
think you want 10,000 48byte files. I think the directory per file works
better with a dumbfs server, since you are just adding new entries.

I also thought about atomicity, and I thought about two basic methods,
WAL (write-ahead logging), and clone and replace. Basically, wal would
be something like .bzr-transaction-log which would include what has
occurred with the tree, when the final commit occurs, the file could be
deleted. Clone-and-replace is basically, copy everything from .bzr to
.bzr-new, make modifications to .bzr-new, and then
rm -rf .bzr
mv .bzr-new .bzr

The clone-and-replace would be really fast on a system that supports
hard-links. You can hardlink, and then copy-on-write. Which allows the
amount of IO to be roughly O(changes) rather than O(repository).

WAL has the clear IO advantage on all systems, but has the disadvantage
of requiring a process to backout changes. I think you could lock the
wal, so that a new bzr process could auto-undo if the lock is gone
(meaning the process died), or you could never auto-undo and use bzr fix
to undo. It also breaks the append-only attributes of files, though not
terribly, as it only removes from the end.

I don't know if there have been other proposals, but I really think
atomicity is important. Having hacked on tla and baz for a while now, I
really want to get back to working in Python, where I don't feel like
I'm hitting my head all the time. I'm interested in contributing, but I
know the TODO is pretty out of date. So if anyone has some specific
picks, let me know.

What about the plugin system? I think I could play around with having a
directory for adding external commands, where the files can be
introspected so that they show up in something like "bzr help commands",
or maybe "bzr help extras".

I also like the idea of having plugins in the same process space, since
you don't have to re-import libraries (and it works a bit better on
windows).

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050509/3883fd38/attachment.pgp 


More information about the bazaar mailing list