[MERGE] repository write groups

Tue Aug 7 07:16:19 BST 2007

On 8/1/07, Robert Collins <robertc at robertcollins.net> wrote:

> > As you say, the names are causing confusion.  "Lock" means "exclude other
> > people from writing to this": that's what it means in general, and that's
> > what the lock elements in our control files do.  My point is that we're
> > changing it so that the caller no longer directly controls that or even
> > sees when it's happening.  Instead, they say when they want to insert data
> > or to have a consistent view or whatever, and the repository decides if
> > and when it should lock.  No?
>
> "Modern" repositories have this flexability, our own older ones, and
> ones we want to interface to, do not have this flexability. I feel like
> you are saying that all repositories can do this, and thats not true.
> For repositories that use OS locks, the lock_write call is semantically
> very important, because upgradable OS locks are not pervasively
> available.

Repository.lock_write no longer seems to have a clear or purpose.  You
defined it as "make it possible for me to call this_other_method" but
that is not semantic in itself; you might as well just have it be a
side-effect of calling that other method.  If the caller has no reason
to know or care when a write lock is held, why is it calling that
method?

I hypothecated two possible meanings for it but neither of them add up
to anything.

I realize that some repositories will need to take an OS lock.  My
point is that if and when the repository is locked is now a repository
implementation detail, not something the client knows about.

I'm not saying to pull out lock_write now, because that would be
traumatic to existing code, but I am reluctant to reference it from
new apis like start_write_group.  From the caller's point of view it
is a dead chicken.

The code you outline for gc seems to do perfectly well without locks
at all -- as it has to, if we want gc to be safe for readers as well
as writers.  So gc is not a reason to have this.

The main tricky case seems to be supporting formats that have a
repository mutex implemented as an OS lock that can't be upgraded from
read to write.  However, they're still ok if their implementation
takes the physical lock when starting the write group.  You would just
be prohibited from doing that within a read_lock.

> > It seems like for new format we really just want two operations the client
> > can ask for: "start caching/discard cache" and "start writing/finish
> > writing".
>
> uhm. I guess. I don't have a strong view here; I think what we have
> works well, the patches are done modulo anything that isn't properly
> interface tested.

If the function doesn't mean anything you shouldn't perpetuate it.

> > I think the main concrete change I want is that start_write_group should
> > not insist on the caller previously taking a lock.  Then at least we can
> > move away from having the callers know about locking.  It can implicitly
> > take a write lock if the format is one which needs it.  I just don't think
> > it should be part of the start_write_group contract.
>
> I think that this implicit lock taking concept is ugly. It makes the
> owner of the lock unclear, and its why I didn't just reuse
> lock_write/unlock and add 'abort_lock'.

Locking should become an implementation detail of the repository.  Old
repositories can hold the lock for the entire duration of the write
group; others for a shorter duration or not at all.

> I don't agree that moving callers away from knowing about the functions
> called 'lock_write' and 'unlock' is a good thing. We may be back to
> method names however.

If you tell me what they mean I'd be more happy to keep them.

-- 
Martin