[RFC] Config files lock

Tue Jul 6 08:14:34 BST 2010

On 6 July 2010 17:05, Vincent Ladeuil <v.ladeuil+lp at free.fr> wrote:
> Now that a minimal fix for bug #525571 has landed, I'd like to summarize
> what I've learned while investigating this bug about how we use the
> config variables.

Thanks for the summary, I think the overall message can easily become
noisy in a heavily-commented merge proposal.
>
> All the usages I've seen query the config for a single variable at a
> time. Most of the time, too, a config object is built, queried and
> thrown away. This means that the config file (or several) are read and
> the result discarded keeping only the value of the variable.

Perhaps we should discuss the right way to use config objects in the
developers' guide?  Especially if we're going to add new patterns
about locking.
>
> At times, one at a time again, the variable value is changed and the
> containing config file written to disk.
>
> The problem is restricted to config files in the ~/.bazaar dir since the
> others should already be covered by their respective locks (branch so
> far, but if we wanted to add config files for working tree and
> repositories, the same remark would apply).
>
> This is not very satisfying but works. Except for the bug mentioned
> above, concurrent writers or readers are pretty rare (because reading or
> writing a config file should result in a single IO most of the
> time). Yet, the minimal fix may still fail under some circumstances so
> it would be better to implement a robust locking scheme around config
> files modifications.
>
> The bug itself occurs on a server where concurrent jobs are started
> almost in parallel and have to write a bigger than usual config
> file. This two points seem enough to explain why we didn't encounter it
> more often.
>
> https://code.edge.launchpad.net/~vila/bzr/525571-lock-bazaar-conf-files/+merge/28898
> implements a write lock around set_user_option() (and similar) which is
> how we modify a config variable.
>
> The lock scheme implemented there says that so set a new value for a
> config variable the following is done:
>
> - take a write lock,
> - re-read the config file (to take concurrent updates into
>  account),
> - set the new value,
> - write the config file,
> - unlock
>
> Re-reading the config file ensures that only one variable is modified at
> a time. And this reading must occur inside the lock scope or we may miss
> some updates.
>
> In other words, we are locking at the variable level.

I think it would be fair to say generally speaking this has been
something we don't want to do with repository operations: implicitly
locking makes it easy to write accidentally slow code.  In this case
it may also make it easy to make things incorrect.

>
> There are various different lock schemes that can be implemented though,
> this one tried to minimize the write lock scope and ignore readers[1].
>
> At the other end of the spectrum, we can implement a read/write
> exclusive lock around the whole file and allow only a single process to
> access it for the duration of the operation. An operation here can be
> 'bzr pull'. In this schme, the config file is read once at start and
> written once at the end.

I think blocking readers is going to be hard to implement portably,
and likely to cause headaches when things stall apparently
unnecessarily.  Also, unless the code is all updated to truly hold the
file locked for the whole duration, it won't truly fix the problem.

>
> This is extreme but is the only way to guarantee that once an operation
> starts reading any variable, no other process will modify it.
>
> For example during a pull, the 'parent' location is read in the
> branch.conf file (or the locations.conf file).  Most probably the user
> really want its pull to finish and changing 'var' to point to a new
> branch should just be considered the next time she try to pull.
>
> So blocking branch.conf (already covered by the branch lock) and
> locations.conf during the whole pull doesn't really make sense.
>
> Another example is two processes doing a 'push --remember' in which case
> the result is undefined, whatever process update the variable last wins
> and even from the user point of view, it's hard to tell which one should
> win.
>
> While I haven't yet found a use case where this later scheme is really
> required, the proposed one seems to be enough, especially since we
> re-read the config file for every variable.
>
> On the other hand, now that I realized that we read the whole file for
> each variable... I'm very tempted to keep the file attached to the
> appropriate object (working tree, branch, repo, etc).
>
> But doing so means that *more* concurrent updates can be lost...
>
> In summary, whle the proposed lock scheme seems to be the simplest (but
> required) one, I think we should stick with it but keep paying attention
> to portential problems. But except for the case where process try to
> share an incremented counter via a config file (a very bad idea).

If that means that we will implicitly lock for read, that seems
probably ok.  Perhaps we should check that what bzr-svn etc store in
the file does not require locking around a larger scope of updates -
and either require explicit locking, or at least give the option of
explicit locking and update them to do it.

-- 
Martin