[MERGE] Commit updates sha-cache
Martin Pool
mbp at canonical.com
Fri Sep 26 08:16:53 BST 2008
On Fri, Sep 26, 2008 at 3:05 AM, John Arbash Meinel
<john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
>> Refreshed for NEWS conflicts
>>
>
> + * New race-free method on MutableTree ``get_file_with_stat`` for use
> + when generating stat cache results. (Robert Collins)
> +
>
> ^- I thought I should point out that 'get_file_with_stat' is not 100%
> race free. If someone is writing to the file while we are reading it,
> the stat value could give a different size than sum(map(len, lines))).
>
> I think it is *less* racy than doing two calls, but I think it would be
> technically better to do the stat after we've finished reading, rather
> than before we start. (slightly different race, but the mtime will show
> up newer)
>
> I'm not saying we have to do that to land this, but I thought I would
> bring up the discussion.
It might be good to add some commentary on this. At any rate just
saying "race free" does sound stronger than is really possible.
Robert's current code stats the file when it is opened, then returns a
file handle from which the contents can be read. If another process
is trying to modify the file then we can have these sequences of
operation (treating reading the contents of the file by eg the commit
as an atomic operation):
modify, stat, read, check_stat --> Trivially, the file is not
modified during the time of interest
stat, modify, read, check_stat --> The hash value computed is for
the modified file, but we think that it's had the old stat value.
When we stat the file later and check it against the dirstate, it will
look like it was modified, and we'll need to read it again. This
assumes that we correctly handle the 'danger window' of stat values
that are close to the current time and so can't be detected as
changing.
stat, read, modify, check_stat --> The file's statted and hashed,
and then later overridden, so the value is just out of date.
read, modify, stat, check_stat --> If this occurs it will be wrong:
we've calculated one hash value, but that pertained to an older
version of the file. However it looks like this code is always
stat'ing first.
I think locking the file is a poor idea because it won't protect us in
all situations (eg environments where file locking is discretionary)
and it will annoy users who do hit it; also it may slow things down.
--
Martin <http://launchpad.net/~mbp/>
More information about the bazaar
mailing list