[RFC] Config files lock

Vincent Ladeuil v.ladeuil+lp at free.fr
Tue Jul 6 08:05:23 BST 2010


Now that a minimal fix for bug #525571 has landed, I'd like to summarize
what I've learned while investigating this bug about how we use the
config variables.

All the usages I've seen query the config for a single variable at a
time. Most of the time, too, a config object is built, queried and
thrown away. This means that the config file (or several) are read and
the result discarded keeping only the value of the variable.

At times, one at a time again, the variable value is changed and the
containing config file written to disk.

The problem is restricted to config files in the ~/.bazaar dir since the
others should already be covered by their respective locks (branch so
far, but if we wanted to add config files for working tree and
repositories, the same remark would apply).

This is not very satisfying but works. Except for the bug mentioned
above, concurrent writers or readers are pretty rare (because reading or
writing a config file should result in a single IO most of the
time). Yet, the minimal fix may still fail under some circumstances so
it would be better to implement a robust locking scheme around config
files modifications.

The bug itself occurs on a server where concurrent jobs are started
almost in parallel and have to write a bigger than usual config
file. This two points seem enough to explain why we didn't encounter it
more often.

https://code.edge.launchpad.net/~vila/bzr/525571-lock-bazaar-conf-files/+merge/28898
implements a write lock around set_user_option() (and similar) which is
how we modify a config variable.

The lock scheme implemented there says that so set a new value for a
config variable the following is done:

- take a write lock,
- re-read the config file (to take concurrent updates into
  account),
- set the new value,
- write the config file,
- unlock

Re-reading the config file ensures that only one variable is modified at
a time. And this reading must occur inside the lock scope or we may miss
some updates.

In other words, we are locking at the variable level.

There are various different lock schemes that can be implemented though,
this one tried to minimize the write lock scope and ignore readers[1].

At the other end of the spectrum, we can implement a read/write
exclusive lock around the whole file and allow only a single process to
access it for the duration of the operation. An operation here can be
'bzr pull'. In this schme, the config file is read once at start and
written once at the end.

This is extreme but is the only way to guarantee that once an operation
starts reading any variable, no other process will modify it.

For example during a pull, the 'parent' location is read in the
branch.conf file (or the locations.conf file).  Most probably the user
really want its pull to finish and changing 'var' to point to a new
branch should just be considered the next time she try to pull.

So blocking branch.conf (already covered by the branch lock) and
locations.conf during the whole pull doesn't really make sense.

Another example is two processes doing a 'push --remember' in which case
the result is undefined, whatever process update the variable last wins
and even from the user point of view, it's hard to tell which one should
win.

While I haven't yet found a use case where this later scheme is really
required, the proposed one seems to be enough, especially since we
re-read the config file for every variable.

On the other hand, now that I realized that we read the whole file for
each variable... I'm very tempted to keep the file attached to the
appropriate object (working tree, branch, repo, etc). 

But doing so means that *more* concurrent updates can be lost...

In summary, whle the proposed lock scheme seems to be the simplest (but
required) one, I think we should stick with it but keep paying attention
to portential problems. But except for the case where process try to
share an incremented counter via a config file (a very bad idea).


      Vincent


[1]: Note that the proposal is incomplete as it doesn't take into
account a possible windows-only scenario where the final rename in the
AtomicFile fail because a reader hasn't closed the file yet.



More information about the bazaar mailing list