[MERGE] repository write groups

Tue Jul 31 05:36:36 BST 2007

On Mon, 2007-07-30 at 23:11 -0500, Martin Pool wrote:
> On 7/30/07, Robert Collins <robertc at robertcollins.net> wrote:
> > On Mon, 2007-07-30 at 22:16 -0500, Martin Pool wrote:
> > > Great, so say that!  "You are only allowed to insert data into a
> > > repository during a write group", if that's the case.
> >
> > I'll add that to the docstring for start_write_group. I do think they
> > should be fully self explanatory without needing external docs outside
> > the API docs.
> 
> Yes, but I think the overall docs should explain the main concepts, and
> "what's a write group" deserves to be answered.  If it's "a thing you must
> be inside before writing" then that's not a detail of start_write_group,
> but rather something you need to use the repository at all.

Thats true - I've added text to lock_write to help with this, which was
my point :). I'll transcribe our dialog about HACKING to
doc/developer/repository.txt.

> > > So it does sound a bit like this is really the write group, and it
> > > should implicitly take (or increment) a write lock if necessary.  "I'm
> > > starting writing... I'm done."
> >
> > Uhm not really.
> >
> > Consider commit:
> > lock_write
> > start_write_group
> > read basis data and per file graph data and so on while we
> >  - insert texts for new file versions
> >  - insert an inventory
> >  - insert a revision
> >  - insert a signature
> > ** at this point all the new non index data is in a temporary pack file.
> > commit_write_group
> >  - finalise indices
> > ** at this point all new data is in temporary files
> >  - physical lock taken
> >  - name allocated
> >  - new data renamed into place
> >  - physical lock removed
> > use new revision data for post commit actions like sending emails,
> > updating the basis inventory in the working tree
> > unlock
> >
> > If you are saying 'start_write_group should do self.lock_write()', I
> > don't think so. Maybe I'm wrong but it doesn't line up all that closely.
> 
> No, that wasn't what I meant.  What I meant was that it would implicitly
> take a lock if _and when_ necessary.  For your example, it's during
> finalization.  For another it might be the whole time, or not at all.
>
> With this change, the caller doesn't know when the repository should be
> locked.

I disagree. The caller should lock around the period they want
consistent views of data, just as currently. 

>   It is kind of an inversion or displacement of control, which is a
> good thing because it allows more room for variation by the repo.  But if
> the caller no longer controls when the repository is locked, why are they
> calling lock_write?

I think our method names may be adding confusion, which is a shame,
because its less clear. But we've got I think reasonable solid concepts
in play in the code base.

Callers call lock_write to get:
 - the ability to insert data (if you can't lock-write, you cannot
insert)
 - a consistent view on the data - in principal data cannot be *removed*
while anyone has a write lock, regardless of the repository format - and
new data inserted by other writers is ignored. In practice until we have
a gc command implemented we can't be at all sure we have this nailed. ce
la vie.

> One reason might be that they want to rely on keeping some things in
> memory so that they can more cheaply do the hooks and update the wt after
> the write group finishes.  Maybe they should do lock_read and do the write
> group inside that?

That requires upgradable locks in some formats, which will be tricky to
do for some underlying formats. The big advantage of keeping our current
interface and not shuffling things around is that it plays nice with
everything from our old all-in-one formats up to the latest stuff I'm
working on, and with svn/git/hg as well.

I think that the direction and questions you are asking are good ones,
but that it would be very easy to define how it should work for an ideal
pack based repository, yet make it much harder to thunk well for older
formats or some foreign formats. To that end I propose doing nothing in
the short term and keeping with:
- lock_read to get a cache context
- lock_write to get a cache context and the ability to call 
  start_write_group
- start_write_group et al are essentially no-ops for repositories where
  lock_write takes out a mutex, and core functionality for ones where
  actual transactional functionality can take place.

Which is to say, that callers using large scale operations like
'commit', 'merge' etc make no changes, callers doing fine grained things
like 'add_inventory' will need to be changed to use write groups, or
better yet the Builder style interface for adding commits.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070731/ae72a343/attachment.pgp