branch locking mk2.

Mon Feb 20 07:59:08 GMT 2006

On 19 Feb 2006, at 7:08, John A Meinel wrote:

> Robey Pointer wrote:
>>
>> On 15 Feb 2006, at 20:09, John A Meinel wrote:
>>
>>> Robey Pointer wrote:
>>>>
>>>> The name "fancy_rename" implies (to me) something like rename,  
>>>> but with
>>>> added magic.  It sounds like it's really just an attempt to  
>>>> implement
>>>> "atomic_rename" of files for Transports that don't have it.   
>>>> Having it
>>>> in osutils further implies that it's related to the local OS.
>>>>
>>>> What if we moved it into Transport.__init__.py and called it
>>>> "atomic_rename", to emphasize that it's really a base  
>>>> implementation for
>>>> Transports that don't have an atomic rename?  Then  
>>>> Transport.rename (or
>>>> .replace) could promise to always be atomic.
>>
>> By the way, knits aside, did anyone have any comments on this?
>>
>
> We also use this elsewhere, like when merging changes into the working
> tree. Growing up on POSIX, people expect rename to be atomic. (And  
> that
> you can delete files even if someone has them open, etc).

Right.  Sounds like you're for it, then?  (Just want to make sure,  
since your answer is a little ambiguous.)

Part of the contract for Transport would now be that renames are "as  
atomic as possible".  And obviously you would never do a rename  
except through a Transport.

>
>>
>>>> Another issue is that if file-renaming isn't atomic on some  
>>>> platforms,
>>>> and we're relying on atomic behavior, this seems... bad.  I  
>>>> don't have
>>>> an answer.  But it troubles me.  Over SFTP, you can't even  
>>>> guarantee
>>>> that the race is short -- we may get disconnected from the  
>>>> server in the
>>>> middle of the atomic_rename, and now there's no weave file.  Or  
>>>> the user
>>>> may just have hit ^C.  Scary.  Will the append-only knit stuff  
>>>> make this
>>>> a non-issue?
>>>
>>> Append only weaves would make it much less of an issue, since we  
>>> would
>>> append to the files that exist, rather than replacing them with  
>>> new ones.
>>>
>>> There will probably be some control files that are still atomically
>>> replaced, like revision-history (as long as that exists).
>>> Also, I don't know how we are trying to keep knit index files  
>>> intact.
>>> I'm guessing we just expect that adding 1 line will be an atomic  
>>> action,
>>> as long as we add less than X number of bytes.
>>> I can think of some things we could do (like adding a checksum +
>>> line-length at the beginning of each line, along with a line  
>>> delimiter
>>> before the line (a simple newline could do)).
>>> Then we can throw out index lines that don't match their  
>>> checksum. And
>>> the delimiter makes sure that we know where the entries start &  
>>> stop.
>>
>> Seems like we rely on the atomic_replace, so trying to think of  
>> some way
>> to make it work correctly...
>>
>> With all transports using the same locking, we'll at least know that
>> only one person can be updating metadata at a time.  So maybe
>> atomic_rename could always use the same temporary, like
>> "<filename>.old".  The worst interruption that could happen then is:
>>
>> <updater #1>
>> open('important.weave.temp-foobar-90210', 'w').write(stuff)
>> rename('important.weave', 'important.weave.old')
>> ---ACK! disconnected
>>
>> <reader>
>> open('important.weave', 'r')  =>  error
>> open('important.weave.old', 'r')  =>  ok, just go with that
>>
>> <updater #2>
>> open('important.weave', 'r')  =>  error
>> open('important.weave.old', 'r')  =>  ok, so fix:
>> rename('important.weave.old', 'important.weave')
>>
>>
>> This would require a bit more work when reading files that may be
>> atomically replaced.  We'd have to check for failed updates and  
>> move the
>> old copy back.
>
> Actually, this isn't all that bad. Just instead of renaming it out of
> the way to a temp location, you rename it to a specific location.
>
> The problem is what happens when you still have the old location, but
> you also have the final location? And you want to overwrite the final
> location. Do you just always issue a 'remove(foo.old)' thereby doing:
>
>   remove(foo.old)
>   put(foo.tmp.XXXXX)
>   rename(foo, foo.old)
>   rename(foo.tmp.XXXXX, foo)
>   remove(foo.old)
>
>
> Actually, I just realized something. As long as we only put() when we
> hold the write lock, with the new locking mechanism, stale locks don't
> disappear (like OS level locks).
> So the only time you need to really do this is when breaking the lock.
> It would mean that a broken branch could not be read by a read-only
> operation (without switching to probing for foo, and foo.old). But we
> could repair the tree.
>
> Determining what is missing and what needs to be fixed might be a  
> little
> tricky, we could write a transaction log in the new locking directory,
> but that has all the performance penalties of lots of round trips, and
> to be safe, you must effectively fsync() before you make a change.
> Though if you knew you were going to be making 50 changes, you could
> write that all out at once. (No need to write each one individually).
>
> And probably we don't need to remove(foo.old) until we get a  
> FileExists
> exception.
>
> So, I'm +1 on the idea of moving to foo.old to replace behind it. I am
> surprised that you can't rename a file over another file. Since  
> that is
> very posixy. I would have expected them to support it, and do tricky
> renaming on platforms that don't support it.

Me too.  It's even worse in later SFTP drafts, where they try to let  
the client request one of 3 different rename behaviors, any of which  
the server can refuse to implement.  Ugh.

And I agree: cleaning up (or restoring from) *.old files would only  
have to occur after breaking a lock.  Maybe breaking a lock could  
perform a lightweight version of "check", to make sure all the files  
are there, and clean up any *.old files it finds.

robey