[MERGE] Commit updates sha-cache

John Arbash Meinel john at arbash-meinel.com
Fri Sep 26 16:20:08 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On Fri, Sep 26, 2008 at 3:05 AM, John Arbash Meinel
> <john at arbash-meinel.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Robert Collins wrote:
>>> Refreshed for NEWS conflicts
>>>
>> +    * New race-free method on MutableTree ``get_file_with_stat`` for use
>> +      when generating stat cache results. (Robert Collins)
>> +
>>
>> ^- I thought I should point out that 'get_file_with_stat' is not 100%
>> race free. If someone is writing to the file while we are reading it,
>> the stat value could give a different size than sum(map(len, lines))).
>>
>> I think it is *less* racy than doing two calls, but I think it would be
>> technically better to do the stat after we've finished reading, rather
>> than before we start. (slightly different race, but the mtime will show
>> up newer)
>>
>> I'm not saying we have to do that to land this, but I thought I would
>> bring up the discussion.
> 
> It might be good to add some commentary on this.  At any rate just
> saying "race free" does sound stronger than is really possible.
> 
> Robert's current code stats the file when it is opened, then returns a
> file handle from which the contents can be read.  If another process
> is trying to modify the file then we can have these sequences of
> operation (treating reading the contents of the file by eg the commit
> as an atomic operation):
> 
>   modify, stat, read, check_stat --> Trivially, the file is not
> modified during the time of interest
>   stat, modify, read, check_stat --> The hash value computed is for
> the modified file, but we think that it's had the old stat value.
> When we stat the file later and check it against the dirstate, it will
> look like it was modified, and we'll need to read it again.  This
> assumes that we correctly handle the 'danger window' of stat values
> that are close to the current time and so can't be detected as
> changing.
>   stat, read, modify, check_stat --> The file's statted and hashed,
> and then later overridden, so the value is just out of date.
> 
>   read, modify, stat, check_stat --> If this occurs it will be wrong:
> we've calculated one hash value, but that pertained to an older
> version of the file.  However it looks like this code is always
> stat'ing first.
> 

Thanks for the breakdown. It does appear that stat'ing first is more
correct. If we get the stat wrong, we'll just update it later, while if
we stat second, we can get it wrong and then not notice we need to
re-read the file.

So while we probably shouldn't say "race-free" it is a good way to do it.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjc/agACgkQJdeBCYSNAAN9HACfUuDYWcCL80dAccv8YrRgrZcC
ziAAn2Nu2PVhSqTq6jq4QtxBilmsupFK
=zmbO
-----END PGP SIGNATURE-----



More information about the bazaar mailing list