branch locking mk2.

Sun Feb 19 15:08:26 GMT 2006

Robey Pointer wrote:
> 
> On 15 Feb 2006, at 20:09, John A Meinel wrote:
> 
>> Robey Pointer wrote:
>>>
>>> The name "fancy_rename" implies (to me) something like rename, but with
>>> added magic.  It sounds like it's really just an attempt to implement
>>> "atomic_rename" of files for Transports that don't have it.  Having it
>>> in osutils further implies that it's related to the local OS.
>>>
>>> What if we moved it into Transport.__init__.py and called it
>>> "atomic_rename", to emphasize that it's really a base implementation for
>>> Transports that don't have an atomic rename?  Then Transport.rename (or
>>> .replace) could promise to always be atomic.
> 
> By the way, knits aside, did anyone have any comments on this?
> 

We also use this elsewhere, like when merging changes into the working
tree. Growing up on POSIX, people expect rename to be atomic. (And that
you can delete files even if someone has them open, etc).

> 
>>> Another issue is that if file-renaming isn't atomic on some platforms,
>>> and we're relying on atomic behavior, this seems... bad.  I don't have
>>> an answer.  But it troubles me.  Over SFTP, you can't even guarantee
>>> that the race is short -- we may get disconnected from the server in the
>>> middle of the atomic_rename, and now there's no weave file.  Or the user
>>> may just have hit ^C.  Scary.  Will the append-only knit stuff make this
>>> a non-issue?
>>
>> Append only weaves would make it much less of an issue, since we would
>> append to the files that exist, rather than replacing them with new ones.
>>
>> There will probably be some control files that are still atomically
>> replaced, like revision-history (as long as that exists).
>> Also, I don't know how we are trying to keep knit index files intact.
>> I'm guessing we just expect that adding 1 line will be an atomic action,
>> as long as we add less than X number of bytes.
>> I can think of some things we could do (like adding a checksum +
>> line-length at the beginning of each line, along with a line delimiter
>> before the line (a simple newline could do)).
>> Then we can throw out index lines that don't match their checksum. And
>> the delimiter makes sure that we know where the entries start & stop.
> 
> Seems like we rely on the atomic_replace, so trying to think of some way
> to make it work correctly...
> 
> With all transports using the same locking, we'll at least know that
> only one person can be updating metadata at a time.  So maybe
> atomic_rename could always use the same temporary, like
> "<filename>.old".  The worst interruption that could happen then is:
> 
> <updater #1>
> open('important.weave.temp-foobar-90210', 'w').write(stuff)
> rename('important.weave', 'important.weave.old')
> ---ACK! disconnected
> 
> <reader>
> open('important.weave', 'r')  =>  error
> open('important.weave.old', 'r')  =>  ok, just go with that
> 
> <updater #2>
> open('important.weave', 'r')  =>  error
> open('important.weave.old', 'r')  =>  ok, so fix:
> rename('important.weave.old', 'important.weave')
> 
> 
> This would require a bit more work when reading files that may be
> atomically replaced.  We'd have to check for failed updates and move the
> old copy back.
> 
> robey
> 

Actually, this isn't all that bad. Just instead of renaming it out of
the way to a temp location, you rename it to a specific location.

The problem is what happens when you still have the old location, but
you also have the final location? And you want to overwrite the final
location. Do you just always issue a 'remove(foo.old)' thereby doing:

  remove(foo.old)
  put(foo.tmp.XXXXX)
  rename(foo, foo.old)
  rename(foo.tmp.XXXXX, foo)
  remove(foo.old)

Actually, I just realized something. As long as we only put() when we
hold the write lock, with the new locking mechanism, stale locks don't
disappear (like OS level locks).
So the only time you need to really do this is when breaking the lock.
It would mean that a broken branch could not be read by a read-only
operation (without switching to probing for foo, and foo.old). But we
could repair the tree.

Determining what is missing and what needs to be fixed might be a little
tricky, we could write a transaction log in the new locking directory,
but that has all the performance penalties of lots of round trips, and
to be safe, you must effectively fsync() before you make a change.
Though if you knew you were going to be making 50 changes, you could
write that all out at once. (No need to write each one individually).

And probably we don't need to remove(foo.old) until we get a FileExists
exception.

So, I'm +1 on the idea of moving to foo.old to replace behind it. I am
surprised that you can't rename a file over another file. Since that is
very posixy. I would have expected them to support it, and do tricky
renaming on platforms that don't support it.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060219/6dd30a6d/attachment.pgp