bzr too slow

Wed Jan 11 13:29:17 GMT 2006

Denys Duchier wrote:
> John Arbash Meinel <john at arbash-meinel.com> writes:
> 
> 
>>Well, hash-cache is written whether we hold a write lock or a read lock.
>>It is a cache file that cannot be invalid, just incorrect. (It gets
>>corrected if it is incorrect.)
>>The same thing is true for .bzr/basis-inventory.
>>
>>Both are atomically written, and contain information to invalidate
>>themselves if they are out of date.
>>Which is why it is okay to write them outside of the lock.
> 
> 
> I understand all this, but if you write the hashcache outside the protection of
> a lock (whether read or write), then there is always the chance that a write
> transaction could take place in between the time when you release the lock and
> the time when you write back the hashcache.  Thus you'll overwrite correct data
> with incorrect data.  It is safe to do so, but not especially clever.

But a write lock means you are modifying the branch data, not the
working directory. And the hash cache keeps track of the working
directory, *not* branch data.
So yes, if a merge into the working directory runs between the time you
let go of the read lock, and the time you write the file you could have
a problem.
But looking through the merge code, I don't see it take out an explicit
branch lock anyway. (Its writing to the working directory, which can't
be locked since another external process might be doing something beyond
our control).

> 
> The case of multiple concurrent read transactions also has problems: each
> transaction may update different parts of the hashcache.  Ideally when writing
> back the hashcache, we should only merge into the on-disk hashcache the entries
> that we have actually updated during the transaction.  In this manner,
> "up-to-date-ness" of the on-disk hashcache would be monotonic.
> 
> 
>>Otherwise, if you really only want to do it when the lock is held, you
>>can do:
>>
>>if self.branch._lock_count == 1:
>>  # write hash cache
>>return self.branch.unlock()
> 
> 
> all this hackery to avoid doing the right thing makes me a little sad.
> 
> --Denys

I think the discussion is whether writing the hash-cache is actually
part of a transaction.
Because in one sense, it really isn't.
The hash-cache is only concerned with the state of the working
directory, which is not directly linked to the state of the branch.
You write the hash-cache when you have a read lock. Which means that two
bzr instances can grab a read lock, and they both will try to update the
hash-cache at the same time. And that would happen whether you use your
'run at cleanup' code, or if we use 'do we have the lock' code, or if we
use 'write when the lock goes away'. None of them fix the problem that
you think you are fixing.

Since your code doesn't actually solve the problem, the question becomes
is it overgeneralizing too early.

I know I brought up the idea of code which gets run if commit() is
successful, and code that always gets run. Robert has some valid concern
over that sort of code. Primarily that Transaction isn't really where
that should be happening.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060111/afc1c72e/attachment.pgp