Solving the commit-editor-locks-stuff-up problem.

Sat Mar 21 02:28:15 GMT 2009

On Fri, 2009-03-20 at 21:32 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > ...the simplest
> > way forward here is to fix the root cause, not work around it - the good
> > and the perfect are in harmony.
> 
> I think there are a couple of candidates for "root cause".
> 
> One is that we acquire the commit message late in the process.  We
> introduced that behaviour because some users complained that if a commit
> failed, their carefully-typed message wasn't used.  If we acquired the
> commit message before taking out the tree lock, we could address many of
> these use cases.  And we could cache the commit message and
> auto-populate the editor with it next time.  The downside is that we
> would be unable to show status or diff information in the commit window.

And we'd be a little more dependent on having good 'here is the message
you typed before' than we are now. It also wouldn't address the general
issue of 'readers are locked out while something is writing'. But we
don't have to fix that to fix commit specifically. However, there are
other bugs that you can't do 'log FILE' while 'pull' is running and so
forth - which are all tied into the dirstate mutex. And the recurring
issues on windows are connected there too.

> If we assume we want to continue doing late prompting, another possible
> root cause is dirstate's design.  AIUI, the goal was to allow in-place
> rewriting of the dirstate, but no client has ever implemented it.  If we
> just drop the os lock, retaining the directory-based lock, will we risk
> any data integrity?

Modulo a couple of concerns, no. And in fact this is how I would
approach it (though I would add a little extra care to make windows
clients less problematic at the same time, by writing new dirstate files
and rotating a pointer to them, or something like that). Currently we do
in-place rewriting of the dirstate I think, but what we don't do is
actively use the bisection facility for 'bzr status foo'. The tendancy
for the least little thing to trigger a full inventory has made the
returns on that a little hard to quantify. Certainly write operations
currently have to write the entire thing always because its checksum
needs updating.

> Do we even need the tree lock?  We certainly don't need to read from the
> user files by that point, because the revision is almost entirely
> committed.  We do need to write, however-- we need to update the tree
> last-revision and the parent trees.  That means rewriting the dirstate.
>  Would it be enough simply to detect changes to the dirstate file and
> abort if substantive changes are made?  I imagine just reading the
> dirstate file is cheap enough.

Ignoring small trees where nearly any approach will work, with big
trees..
first commit - generating the revision is 99% of the work.
incremental commit - reading and writing the dirstate is something like
15-20% of the work, as of the last heavily tuned commit status.
Conceptually, to commit a single file in a 100 file tree, dirstate
should be about 90% of the time, but we're not there yet.

So the main problem isn't write integrity... thats easy: write the new
dirstate to a tempfile and move it over before we release the tree lock.
The problem is, and always has been reader integrity *and* reader
lock-outs. A reader on windows will prevent the commit completing until
it has closed the file. Partial parsing is written and implemented, even
though its not switched on; I don't think we need to discard that when
its very easy to write new files, and nothing needs to be keeping open a
pointer file.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090321/9dabd8ad/attachment.pgp