Revfile vs Atomicity & Dumbfs

John A Meinel john at arbash-meinel.com
Tue May 10 01:03:08 BST 2005


Aaron Bentley wrote:
> John A Meinel wrote:
>
> | Right now, I think you are just keeping a complete copy of each revision
> | of a file, which you obviously don't want to do over time. The current
> | suggestion is to use the "revfile" method, which has an append-only
> | index and an append-only text store.
> |
> | The thing is, append-only isn't very transaction safe,
>
> Actually, revfiles can easily be done in a transaction-safe way, because
> as Martin explained to me, appending to the text-contents file is
> essentially a no-op until you rewrite the indices.  And rewriting the
> indices can be made atomic using the standard write-to-temp,
> rename-to-final technique.

Except if you are rewriting 10 different files, then you have to make
them all go "pop" at the same time. The issue with bzr is that it keeps
each file separately, and so you don't have one directory/file to change
to say everything is done. As Tom Lord pointed out, you really need
"prepare, committed if x committed, committed". And then future runs
should check for commit if signaled. Now, if Martin is correct in saying
that filesystems don't guarantee X is written before Y (even after fsync)

>
> | and atomicity. And unless I'm mistaken, it is easier to add a new file
> | to a remote connection, than it is to append to an existing one (at
> | least with sftp/ftp, webdav may be different).
>
> AIUI, sftp handles appends with aplomb.  It's webDAV that's not reliable
> for that.  And Martin hasn't committed to using webDAV for write:
> "perhaps for writing it is reasonable to require a svn+ssh style server
> invoked over a socket"
> http://bazaar-ng.org/doc/design.html

However, this still breaks the dumbfs implementation. Now if he is
saying ssh access for write access, that isn't a big deal. But if you
have to have a server for write access, you don't have a dumbfs.

>
> | (I suppose you could do sequential compressed streams in one
> | file, though too).
>
> "sequential compressed streams in one file" sounds a lot like revfiles.

I was talking about revfiles at this point.

>
> | I also thought about atomicity, and I thought about two basic methods,
> | WAL (write-ahead logging), and clone and replace. Basically, wal would
> | be something like .bzr-transaction-log which would include what has
> | occurred with the tree, when the final commit occurs, the file could be
> | deleted. Clone-and-replace is basically, copy everything from .bzr to
> | .bzr-new, make modifications to .bzr-new, and then
> | rm -rf .bzr
> | mv .bzr-new .bzr
>
> Write-ahead logging would also be nice for operations like merging that
> are necessarily non-atomic.  If a merge is interrupted, you'll wind up
> with junk dirs all over your tree.  You can revert, of course, but
> reversing a log seems cleaner.

Well, if you wanted to make merging a little bit more atomic, you can do
merging into temp files, and use the same multi-stage commit. But with
that you still get bogus files lying around, but in theory the next run
of bzr (or maybe bzr fix), should clean up after itself.

It depends what you want merge to do, though. For a conflict, it seems
like you should keep going and mark the file conflicted. If the program
dies in the middle it would be nice to undo automatically, but I think
"bzr revert" is probably good enough for merges.

>
> | What about the plugin system? I think I could play around with having a
> | directory for adding external commands, where the files can be
> | introspected so that they show up in something like "bzr help commands",
> | or maybe "bzr help extras".
>
> That would be cool.  I'd like it to be a directory of directories, so
> that you can have commands from multiple sources.  e.g:
>
> $ cd ~/bzr-extras
> $ rsync -aur foo:bzrtools
> $ ls bzrtools
> foo.py
> $ rsync -aur foo:bzrutils
> $ ls bzrutils
> bar.py
> $ bzr help
> ...
> ~  bzr bar
> ~      Do bar.
> ~  bzr foo
> ~      Do foo.
>

Well, I would assume 1 layer deep would be what you are asking for, and
probably don't worry about a "manifest" file, since that makes
maintenance more than just drop in a file.

Did you see my earlier post about having a signature line so you don't
accidentally execute a script that isn't meant to be run by bzr? Is that
worth anything? What about the whitelist idea? Or if you want to get
extreme, gpg-signed files, with a whitelist based on author. I think the
last is unnecessary, but I can think that having a global plugin
directory for a site would be nice to have, but that you should have
some sort of trust mechanism, so someone can't pop in a file and
"bzr help" runs it.

And finally, functionality wise having plugins override builtins can be
very useful. Especially when coupled with per-site plugin directories,
an administrator can basically setup a policy, and then bzr commit
follows that for everyone. Is this not worth the potential security
risks? It seems pretty easy to implement.

>
> Aaron

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050509/749a1fcb/attachment.pgp 


More information about the bazaar mailing list