[MERGE] repository write groups

Tue Jul 31 02:54:00 BST 2007

On Mon, 2007-07-30 at 20:31 -0500, Martin Pool wrote:
> On 7/30/07, Robert Collins <robertc at robertcollins.net> wrote:
> > On Mon, 2007-07-30 at 11:02 -0400, Martin Pool wrote:
> > I'm requiring all repositories to use write_groups; for some its just
> > going through the motions, for others, like the pack based repository,
> > its core to how it works. So I'd say 'Repositories use the concept of a
> > write_group.'
> 
> ok
> 
> > >    These are created and returned by ``start_write_group``, and
> > >    concluded by either ``commit_write_group`` or
> > >    ``abort_write_group``.  A write lock must be held on the
> > >    repository for the entire duration.
> > >    At most one write group can be
> > >    active on a repository at a time.
> > >
> > >    Write groups are different from write locks in that they explicitly
> > >    control when data is committed into the repository.  Several write
> > >    groups could be committed during a single lock.
> >
> > Perhaps "Write locks provide a cache of repository data during the
> > period of the write lock, and allow write_groups to be acquired. For
> > some repositories the presence of a write lock is exclusive to a single
> > client, for others which are lock free or use server side locks (e.g.
> > svn), the write lock simply provides the cache context. Write groups
> > provide the window during which data is being inserted inside the larger
> > caching context.
> >
> > There is no guarantee that data inserted during a write group will be
> > invisible in the repository if the write group is not committed. Some
> > repositories have an atomic insertion facility, and for those
> > all-or-nothing will apply."
> 
> OK, but this doesn't really tell me what a write group is, only what
> it's not.  For something that writes container files the
> correspondence is obvious, but what should other implementations do
> with it?

Nothing. The write group is just 'the window during which data is being
inserted'.

> Is it just an entirely optional hint that some related writing is
> being done?  Or is the caller making a promise to the repository, or
> vice versa?

The caller is not able to insert data outside the write group, so the
repository is promising to allow data insertion during the group.

> > transactions are really about caching of data in memory, write groups
> > are about explicitly starting and finishing actual writes.
> 
> out of context of this review {{
> 
> Hm, so the transaction could be thought of as a read_group, in that it
> gives a scope within which read operations are grouped?  (Almost - I'm
> not saying that would be a good name, just that we have three similar
> objects here and if though we may need all three we should at least be
> clear about what they are.)
> 
> In the context of that explanation, what then is a repository write
> lock?  It's not a mutex, as it is on wt and branch, and that suggests
> it's maybe not the same object.  It's not a transaction that controls
> caching, because that's a different class?  (At the moment lock_write
> creates a transaction but if the caller actually wants a transaction
> not a lock maybe they should say so?)  And now it clearly doesn't
> control writing...
> 
> }}

On some repositories a write lock is a mutex, because of the performance
impact of doing reads after a write has occured - e.g. weaves, where
writing reorganises large amounts of data. On others, like the one I'm
working on, or on svn, the write lock can become fictional, with actual
mutex's occuring during data insertion. E.g. I plan for
commit_write_group on pack repositories to be something like:
write out indices
take out a physical lock on the repository
allocate a name for the new pack
rename the pack and indices into place
release the physical lock

> > >       def test_commit_message(self):
> > >           tree = self.make_branch_and_tree(".")
> > > +        tree.lock_write()
> > >           builder = tree.branch.get_commit_builder([])
> > >           self.record_root(builder, tree)
> > >           builder.finish_inventory()
> > >           rev_id = builder.commit('foo bar blah')
> > > +        tree.unlock()
> > >           rev = tree.branch.repository.get_revision(rev_id)
> > >           self.assertEqual('foo bar blah', rev.message)
> > >
> > > The unlock should probably be in a finally in all these tests, even
> > > though
> > > it's just a test, otherwise we may get open-file errors on Windows.  I
> > > guess we can stand that if it's a failing test, but still.
> >
> > I deliberately don't do that because its just noise in the test - as
> > long as the failure is readable, its still a failure.
> 
> I see what you mean, but I still feel uncomfortable having
> intentionally incorrect code in tests.  (Incorrect in that the
> contract for these is that unlock should be called from a finally
> block.)  I can articulate two reasons: one is that failures tend to be
> more mysterious than you expect (and it may not be _you_ debugging
> this on Windows), and secondly that not putting it in a block sets a
> bad example.
> 
> Not closing files (even in failure cases, iirc) caused pain for people
> testing on Windows and I don't see why this is any different.

not closing files can cause errors in success cases; and also can have
the file object open in the traceback stack frames in error cases. This
is different because there are no physical resources kept open to cause
operating system level errors.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070731/9dada41f/attachment.pgp