[MERGE] repository write groups

Tue Jul 31 02:31:25 BST 2007

On 7/30/07, Robert Collins <robertc at robertcollins.net> wrote:
> On Mon, 2007-07-30 at 11:02 -0400, Martin Pool wrote:
> I'm requiring all repositories to use write_groups; for some its just
> going through the motions, for others, like the pack based repository,
> its core to how it works. So I'd say 'Repositories use the concept of a
> write_group.'

ok

> >    These are created and returned by ``start_write_group``, and
> >    concluded by either ``commit_write_group`` or
> >    ``abort_write_group``.  A write lock must be held on the
> >    repository for the entire duration.
> >    At most one write group can be
> >    active on a repository at a time.
> >
> >    Write groups are different from write locks in that they explicitly
> >    control when data is committed into the repository.  Several write
> >    groups could be committed during a single lock.
>
> Perhaps "Write locks provide a cache of repository data during the
> period of the write lock, and allow write_groups to be acquired. For
> some repositories the presence of a write lock is exclusive to a single
> client, for others which are lock free or use server side locks (e.g.
> svn), the write lock simply provides the cache context. Write groups
> provide the window during which data is being inserted inside the larger
> caching context.
>
> There is no guarantee that data inserted during a write group will be
> invisible in the repository if the write group is not committed. Some
> repositories have an atomic insertion facility, and for those
> all-or-nothing will apply."

OK, but this doesn't really tell me what a write group is, only what
it's not.  For something that writes container files the
correspondence is obvious, but what should other implementations do
with it?

Is it just an entirely optional hint that some related writing is
being done?  Or is the caller making a promise to the repository, or
vice versa?

> transactions are really about caching of data in memory, write groups
> are about explicitly starting and finishing actual writes.

out of context of this review {{

Hm, so the transaction could be thought of as a read_group, in that it
gives a scope within which read operations are grouped?  (Almost - I'm
not saying that would be a good name, just that we have three similar
objects here and if though we may need all three we should at least be
clear about what they are.)

In the context of that explanation, what then is a repository write
lock?  It's not a mutex, as it is on wt and branch, and that suggests
it's maybe not the same object.  It's not a transaction that controls
caching, because that's a different class?  (At the moment lock_write
creates a transaction but if the caller actually wants a transaction
not a lock maybe they should say so?)  And now it clearly doesn't
control writing...

}}

> >       def test_commit_message(self):
> >           tree = self.make_branch_and_tree(".")
> > +        tree.lock_write()
> >           builder = tree.branch.get_commit_builder([])
> >           self.record_root(builder, tree)
> >           builder.finish_inventory()
> >           rev_id = builder.commit('foo bar blah')
> > +        tree.unlock()
> >           rev = tree.branch.repository.get_revision(rev_id)
> >           self.assertEqual('foo bar blah', rev.message)
> >
> > The unlock should probably be in a finally in all these tests, even
> > though
> > it's just a test, otherwise we may get open-file errors on Windows.  I
> > guess we can stand that if it's a failing test, but still.
>
> I deliberately don't do that because its just noise in the test - as
> long as the failure is readable, its still a failure.

I see what you mean, but I still feel uncomfortable having
intentionally incorrect code in tests.  (Incorrect in that the
contract for these is that unlock should be called from a finally
block.)  I can articulate two reasons: one is that failures tend to be
more mysterious than you expect (and it may not be _you_ debugging
this on Windows), and secondly that not putting it in a block sets a
bad example.

Not closing files (even in failure cases, iirc) caused pain for people
testing on Windows and I don't see why this is any different.

-- 
Martin