Rev 2513: (Ian Clatworthy) Improve 'bzr commit' time by removing some lock and dirstate._validate() calls in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Wed Jun 6 15:42:03 BST 2007

At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2513
revision-id: pqm at pqm.ubuntu.com-20070606144200-rmsd3gyelimh8kal
parent: pqm at pqm.ubuntu.com-20070606083714-rt2za45t9gt5nqqh
parent: ian.clatworthy at internode.on.net-20070606061745-e5zl32uwku409wca
committer: Canonical.com Patch Queue Manager<pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2007-06-06 15:42:00 +0100
message:
  (Ian Clatworthy) Improve 'bzr commit' time by removing some lock and dirstate._validate() calls
added:
  doc/developers/performance-commit.txt performancecommit.tx-20070606061633-4y4rawskx5ejb99w-1
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/commit.py               commit.py-20050511101309-79ec1a0168e0e825
  bzrlib/dirstate.py             dirstate.py-20060728012006-d6mvoihjb3je9peu-1
  bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
    ------------------------------------------------------------
    revno: 2511.1.4
    merged: ian.clatworthy at internode.on.net-20070606061745-e5zl32uwku409wca
    parent: ian.clatworthy at internode.on.net-20070606061158-zbwatmjd2m1p474g
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.fast-commit
    timestamp: Wed 2007-06-06 16:17:45 +1000
    message:
      updated NEWS and added commit performance notes to doc/developers
    ------------------------------------------------------------
    revno: 2511.1.3
    merged: ian.clatworthy at internode.on.net-20070606061158-zbwatmjd2m1p474g
    parent: ian.clatworthy at internode.on.net-20070606060745-xchfmk7d9vckgy3r
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.fast-commit
    timestamp: Wed 2007-06-06 16:11:58 +1000
    message:
      remove implicit read locks for kind() and is_executable() in wt4
    ------------------------------------------------------------
    revno: 2511.1.2
    merged: ian.clatworthy at internode.on.net-20070606060745-xchfmk7d9vckgy3r
    parent: ian.clatworthy at internode.on.net-20070606055603-monl116zotkbqn2y
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.fast-commit
    timestamp: Wed 2007-06-06 16:07:45 +1000
    message:
      remove calls to dirstate._validate() that shouldn't be on production code
    ------------------------------------------------------------
    revno: 2511.1.1
    merged: ian.clatworthy at internode.on.net-20070606055603-monl116zotkbqn2y
    parent: pqm at pqm.ubuntu.com-20070606050006-o4yiw7jnwytgf561
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.fast-commit
    timestamp: Wed 2007-06-06 15:56:03 +1000
    message:
      commit.py clean-up including logging just to stderr, not bzr.log
=== added file 'doc/developers/performance-commit.txt'

--- a/doc/developers/performance-commit.txt	1970-01-01 00:00:00 +0000
+++ b/doc/developers/performance-commit.txt	2007-06-06 06:17:45 +0000
@@ -0,0 +1,262 @@
+Commit Performance Notes
+========================
+
+.. contents::
+
+Commit: The Minimum Work Required
+---------------------------------
+
+This is Martin Pool's email to the mailing list on the minimum work that
+commit needs to do. Be sure to check the mailing list archives 
+(https://lists.ubuntu.com/archives/bazaar/2007q2/025791.html)
+for follow-up comments from Robert and others ...
+
+Here is a description of the minimum work that commit must do.  We
+want to make sure that our design doesn't cost too much more than this
+minimum.  I am trying to do this without making too many assumptions
+about the underlying storage, but am assuming that the ui and basic
+architecture (wt, branch, repo) stays about the same.
+
+The basic purpose of commit is to:
+
+1. create and store a new revision based on the contents of the working tree
+2. make this the new basis revision for the working tree
+
+We can do a selected commit of only some files or subtrees.
+
+The best performance we could hope for is:
+- stat each versioned selected working file once
+- read from the workingtree and write into the repository any new file texts
+- in general, do work proportional to the size of the shape (eg
+inventory) of the old and new selected trees, and to the total size of
+the modified files
+
+In more detail:
+
+1.0 - Store new file texts: if a versioned file contains a new text
+there is no avoiding storing it.  To determine which ones have changed
+we must go over the workingtree and at least stat each file.  If the
+file is modified since it was last hashed, it must be read in.
+Ideally we would read it only once, and either notice that it has not
+changed, or store it at that point.
+
+On the other hand we want new code to be able to handle files that are
+larger than will fit in memory.  We may then need to read each file up
+to two times: once to determine if there is a new text and calculate
+its hash, and again to store it.
+
+1.1 - Store a tree-shape description (ie inventory or similar.)  This
+describes the non-file objects, and provides a reference from the
+Revision to the texts within it.
+
+1.2 - Generate and store a new revision object.
+
+1.3 - Do delta-compression on the stored objects.  (git notably does
+not do this at commit time, deferring this entirely until later.)
+This requires finding the appropriate basis for each modified file: in
+the current scheme we get the file id, last-revision from the
+dirstate, look into the knit for that text, extract that text in
+total, generate a delta, then store that into the knit.  Most delta
+operations are O(n2) to O(n3) in the size of the modified files.
+
+1.4 - Cache annotation information for the changes: at the moment this
+is done as part of the delta storage.  There are some flaws in that
+approach, such as that it is not updated when ghosts are filled, and
+the annotation can't be re-run with new diff parameters.
+
+2.1 - Make the new revision the basis for the tree, and clear the list
+of parents.  Strictly this is all that's logically necessary, unless
+the working tree format requires more work.
+
+The dirstate format does require more work, because it caches the
+parent tree data for each file within the working tree data.  In
+practice this means that every commit rewrites the entire dirstate
+file - we could try to avoid rewriting the whole file but this may be
+difficult because variable-length data (the last-changed revision id)
+is inserted into many rows.
+
+The current dirstate design then seems to mean that any commit of a
+single file imposes a cost proportional to the size of the current
+workingtree.  Maybe there are other benefits that outweigh this.
+Alternatively if it was fast enough for operations to always look at
+the original storage of the parent trees we could do without the
+cache.
+
+2.2 - Record the observed file hashes into the workingtree control
+files.  For the files that we just committed, we have the information
+to store a valid hash cache entry: we know their stat information and
+the sha1 of the file contents.  This is not strictly necessary to the
+speed of commit, but it will be useful later in avoiding reading those
+files, and the only cost of doing it now is writing it out.
+
+In fact there are some user interface niceties that complicate this:
+
+3 - Before starting the commit proper, we prompt for a commit message
+and in that commit message editor we show a list of the files that
+will be committed: basically the output of bzr status.  This is
+basically the same as the list of changes we detect while storing the
+commit, but because the user will sometimes change the tree after
+opening the commit editor and expect the final state to be committed I
+think we do have to look for changes twice.  Since it takes the user a
+while to enter a message this is not a big problem as long as both the
+status summary and the commit are individually fast.
+
+4 - As the commit proceeds (or after?) we show another status-like
+summary.  Just printing the names of modified files as they're stored
+would be easy.  Recording deleted and renamed files or directories is
+more work: this can only be done by reference to the primary parent
+tree and requires it be read in.  Worse, reporting renames requires
+searching by id across the entire parent tree.   Possibly full
+reporting should be a default-off verbose option because it does
+require more work beyond the commit itself.
+
+5 - Bazaar currently allows for missing files to be automatically
+marked as removed at the time of commit.  Leaving aside the ui
+consequences, this means that we have to update the working inventory
+to mark these files as removed.  Since as discussed above we always
+have to rewrite the dirstate on commit this is not substantial, though
+we should make sure we do this in one pass, not two.  I have
+previously proposed to make this behaviour a non-default option.
+
+We may need to run hooks or generate signatures during commit, but
+they don't seem to have substantial performance consequences.
+
+If one wanted to optimize solely for the speed of commit I think
+hash-addressed  file-per-text storage like in git (or bzr 0.1) is very
+good.  Remarkably, it does not need to read the inventory for the
+previous revision.  For each versioned file, we just need to get its
+hash, either by reading the file or validating its stat data.  If that
+hash is not already in the repository, the file is just copied in and
+compressed.  As directories are traversed, they're turned into texts
+and stored as well, and then finally the revision is too.  This does
+depend on later doing some delta compression of these texts.
+
+Variations on this are possible.  Rather than writing a single file
+into the repository for each text, we could fold them into a single
+collation or pack file.  That would create a smaller number of files
+in the repository, but looking up a single text would require looking
+into their indexes rather than just asking the filesystem.
+
+Rather than using hashes we can use file-id/rev-id pairs as at
+present, which has several consequences pro and con.
+
+
+Commit vs Status
+----------------
+
+At first glance, commit simply stores the changes status reports. In fact,
+this isn't technically correct: commit considers some files modified that
+status does not. The notes below were put together by John Arbash Meinel
+and Aaron Bentley in May 2007 to explain the finer details of commit to
+Ian Clatworthy. They are recorded here as they are likely to be useful to
+others new to Bazaar ...
+
+1) **Unknown files have a different effect.** With --no-strict (the default)
+   they have no effect and can be completely ignored. With --strict they
+   should cause the commit to abort (so you don't forget to add the two new
+   test files that you just created).
+
+2) **Multiple parents.** 'status' always compares 2 trees, typically the
+   last-committed tree and the current working tree. 'commit' will compare
+   more trees if there has been a merge.
+
+  a) The "last modified" property for files.
+     A file may be marked as changed since the last commit, but that
+     change may have come in from the merge, and the change could have
+     happened several commits back. There are several edge cases to be
+     handled here, like if both branches modified the same file, or if
+     just one branch modified it.
+     
+  b) The trickier case is when a file appears unmodified since last
+     commit, but it was modified versus one of the merged branches. I
+     believe there are a few ways this can happen, like if a merged
+     branch changes a file and then reverts it back (you still update
+     the 'last modified' field).
+     In general, if both sides disagree on the 'last-modified' flag,
+     then you need to generate a new entry pointing 'last-modified' at
+     this revision (because you are resolving the differences between
+     the 2 parents).
+
+3) **Automatic deletion of 'missing' files.** This is a point that we go
+   back and forth on. I think the basic idea is that 'bzr commit' by
+   default should abort if it finds a 'missing' file (in case that file was
+   renamed rather than deleted), but 'bzr commit --auto' can add unknown
+   files and remove missing files automatically.
+
+4) **sha1 for newly added files.** status doesn't really need this: it should
+   only care that the file is not present in base, but is present now. In
+   some ways commit doesn't care either, since it needs to read and sha the
+   file itself anyway.
+
+5) **Nested trees.** status doesn't recurse into nested trees, but commit does.
+   This is just because not all of the nested-trees work has been merged yet.
+
+   A tree-reference is considered modified if the subtree has been
+   committed since the last containing-tree commit.  But commit needs to
+   recurse into every subtree, to ensure that a commit is done if the
+   subtree has changed since its last commit.  _iter_changes only reports
+   on tree-references that are modified, so it can't be used for doing
+   subtree commits.
+
+
+Avoiding Work: Smarter Change Detection
+---------------------------------------
+
+Commit currently walks through every file building an inventory. Here is
+Aaron's brain dump on a better way ...
+
+_iter_changes won't tell us about tree references that haven't changed,
+even if those subtrees have changed.  (Unless we ask for unchanged
+files, which we don't want to do, of course.)
+
+There is an iter_references method, but using it looks just as expensive
+as calling kind().
+
+I did some work on updating commit to use iter_changes, but found for
+multi-parent trees, I had to fall back to the slow inventory comparison
+approach.
+
+Really, I think we need a call akin to iter_changes that handles
+multiple parents, and knows to emit entries when InventoryEntry.revision
+is all that's changed.
+
+
+Avoiding Work: Better Layering
+------------------------------
+
+For each file, commit is currently doing more work than it should. Here is
+John's take on a better way ...
+
+Note that "_iter_changes" *does* have to touch every path on disk, but
+it just can do it in a more efficient manner. (It doesn't have to create
+an InventoryEntry for all the ones that haven't changed).
+
+I agree with Aaron that we need something a little different than
+_iter_changes. Both because of handling multiple parents, as well as we
+don't want it to actually read the files if we have a stat-cache miss.
+
+Specifically, the commit code *has* to read the files because it is
+going to add the text to the repository, and we want it to compute the
+sha1 at *that* time, so we are guaranteed to have the valid sha (rather
+than just whatever the last cached one was). So we want the code to
+return 'None' if it doesn't have an up-to-date sha1, rather than reading
+the file and computing it, just before it returns it to the parent.
+
+The commit code (0.16) should really be restructured. It's layering is
+pretty wrong.
+
+Specifically, calling "kind()" requires a stat of the file. But we have
+to do a stat to get the size/whether the record is up-to-date, etc. So
+we really need to have a "create_an_up_to_date_inventory()" function.
+But because we are accessing every object on disk, we want to be working
+in tuples rather than Inventory objects. And because DirState already
+has the parent records next to the current working inventory, it can do
+all the work to do really fast comparison and throw-away of unimportant
+records.
+
+The way I made "bzr status" fast is by moving the 'ignore this record'
+ability as deep into the stack as I could get. Status has the property
+that you don't care about most of the records, just like commit. So the
+sooner you can stop evaluating the 99% that you don't care about, the
+less work you do.
+

=== modified file 'NEWS'
--- a/NEWS	2007-06-06 01:07:25 +0000
+++ b/NEWS	2007-06-06 06:17:45 +0000
@@ -1,5 +1,21 @@
 IN DEVELOPMENT
 
+  NOTES WHEN UPGRADING:
+
+    * The kind() and is_executable() APIs on the WorkingTree interface no
+      longer implicitly (read) locks and unlocks the tree. This *might*
+      impact some plug-ins and tools using this part of the API. If you find
+      an issue that may be caused by this change, please let us know,
+      particularly the plug-in/tool maintainer. If encountered, the API
+      fix is to surround kind() and is_executable() calls with lock_read()
+      and unlock() like so::
+
+        work_tree.lock_read()
+        try:
+            kind = work_tree.kind(...)
+        finally:
+            work_tree.unlock()
+
   INTERNALS:
     * Rework of LogFormatter API to provide beginning/end of log hooks and to
       encapsulate the details of the revision to be logged in a LogRevision
@@ -33,6 +49,11 @@
       improving speed.  3.25x speedups have been observed for construction of
       kernel-sized-trees, and checkouts are 1.28x faster.  (Aaron Bentley)
 
+    * Commit on large trees is now faster. In my environment, a commit of
+      a small change to the Mozilla tree (55k files) has dropped from
+      66 seconds to 32 seconds. For a small tree of 600 files, commit of a
+      small change is 33% faster. (Ian Clatworthy)
+
   BUGFIXES:
 
     * ``bzr push`` should only connect to the remote location one time.

=== modified file 'bzrlib/commit.py'
--- a/bzrlib/commit.py	2007-05-21 14:14:36 +0000
+++ b/bzrlib/commit.py	2007-06-06 05:56:03 +0000
@@ -106,32 +106,40 @@
 
 class ReportCommitToLog(NullCommitReporter):
 
-    # this may be more useful if 'note' was replaced by an overridable
-    # method on self, which would allow more trivial subclassing.
-    # alternative, a callable could be passed in, allowing really trivial
-    # reuse for some uis. RBC 20060511
+    def _note(self, format, *args):
+        """Output a message.
+
+        Messages are output by writing directly to stderr instead of
+        using bzrlib.trace.note(). The latter constantly updates the
+        log file as we go causing an unnecessary performance hit.
+
+        Subclasses may choose to override this method but need to be aware
+        of its potential impact on performance.
+        """
+        bzrlib.ui.ui_factory.clear_term()
+        sys.stderr.write((format + "\n") % args)
 
     def snapshot_change(self, change, path):
         if change == 'unchanged':
             return
         if change == 'added' and path == '':
             return
-        note("%s %s", change, path)
+        self._note("%s %s", change, path)
 
     def completed(self, revno, rev_id):
-        note('Committed revision %d.', revno)
+        self._note('Committed revision %d.', revno)
     
     def deleted(self, file_id):
-        note('deleted %s', file_id)
+        self._note('deleted %s', file_id)
 
     def escaped(self, escape_count, message):
-        note("replaced %d control characters in message", escape_count)
+        self._note("replaced %d control characters in message", escape_count)
 
     def missing(self, path):
-        note('missing %s', path)
+        self._note('missing %s', path)
 
     def renamed(self, change, old_path, new_path):
-        note('%s %s => %s', change, old_path, new_path)
+        self._note('%s %s => %s', change, old_path, new_path)
 
 
 class Commit(object):
@@ -153,10 +161,7 @@
             self.reporter = reporter
         else:
             self.reporter = NullCommitReporter()
-        if config is not None:
-            self.config = config
-        else:
-            self.config = None
+        self.config = config
         
     def commit(self,
                message=None,
@@ -177,26 +182,26 @@
                recursive='down'):
         """Commit working copy as a new revision.
 
-        message -- the commit message (it or message_callback is required)
-
-        timestamp -- if not None, seconds-since-epoch for a
-             postdated/predated commit.
-
-        specific_files -- If true, commit only those files.
-
-        rev_id -- If set, use this as the new revision id.
+        :param message: the commit message (it or message_callback is required)
+
+        :param timestamp: if not None, seconds-since-epoch for a
+            postdated/predated commit.
+
+        :param specific_files: If true, commit only those files.
+
+        :param rev_id: If set, use this as the new revision id.
             Useful for test or import commands that need to tightly
             control what revisions are assigned.  If you duplicate
             a revision id that exists elsewhere it is your own fault.
             If null (default), a time/random revision id is generated.
 
-        allow_pointless -- If true (default), commit even if nothing
+        :param allow_pointless: If true (default), commit even if nothing
             has changed and no merges are recorded.
 
-        strict -- If true, don't allow a commit if the working tree
+        :param strict: If true, don't allow a commit if the working tree
             contains unknown files.
 
-        revprops -- Properties for new revision
+        :param revprops: Properties for new revision
         :param local: Perform a local only commit.
         :param recursive: If set to 'down', commit in any subtrees that have
             pending changes of any sort during this commit.
@@ -235,7 +240,6 @@
         self.committer = committer
         self.strict = strict
         self.verbose = verbose
-        self.local = local
 
         if reporter is None and self.reporter is None:
             self.reporter = NullCommitReporter()
@@ -248,30 +252,15 @@
         self.basis_tree.lock_read()
         try:
             # Cannot commit with conflicts present.
-            if len(self.work_tree.conflicts())>0:
+            if len(self.work_tree.conflicts()) > 0:
                 raise ConflictsInTree
 
-            # setup the bound branch variables as needed.
+            # Setup the bound branch variables as needed.
             self._check_bound_branch()
 
-            # check for out of date working trees
-            try:
-                first_tree_parent = self.work_tree.get_parent_ids()[0]
-            except IndexError:
-                # if there are no parents, treat our parent as 'None'
-                # this is so that we still consier the master branch
-                # - in a checkout scenario the tree may have no
-                # parents but the branch may do.
-                first_tree_parent = bzrlib.revision.NULL_REVISION
-            old_revno, master_last = self.master_branch.last_revision_info()
-            if master_last != first_tree_parent:
-                if master_last != bzrlib.revision.NULL_REVISION:
-                    raise errors.OutOfDateTree(self.work_tree)
-            if self.branch.repository.has_revision(first_tree_parent):
-                new_revno = old_revno + 1
-            else:
-                # ghost parents never appear in revision history.
-                new_revno = 1
+            # Check that the working tree is up to date
+            old_revno,new_revno = self._check_out_of_date_tree()
+
             if strict:
                 # raise an exception as soon as we find a single unknown.
                 for unknown in self.work_tree.unknowns():
@@ -289,6 +278,8 @@
                 # cheaply?
                 tree.find_ids_across_trees(specific_files,
                                            [self.basis_tree, self.work_tree])
+
+            # Setup the progress bar ...
             # one to finish, one for rev and inventory, and one for each
             # inventory entry, and the same for the new inventory.
             # note that this estimate is too long when we do a partial tree
@@ -301,16 +292,15 @@
             if len(self.parents) > 1 and self.specific_files:
                 raise errors.CannotCommitSelectedFileMerge(self.specific_files)
             
+            # Build the new inventory
             self.builder = self.branch.get_commit_builder(self.parents,
                 self.config, timestamp, timezone, committer, revprops, rev_id)
-            
             self._remove_deleted()
             self._populate_new_inv()
             self._report_deletes()
-
             self._check_pointless()
+            self._emit_progress_update()
 
-            self._emit_progress_update()
             # TODO: Now the new inventory is known, check for conflicts and
             # prompt the user for a commit message.
             # ADHB 2006-08-08: If this is done, populate_new_inv should not add
@@ -323,9 +313,9 @@
             self.message = message
             self._escape_commit_message()
 
+            # Add revision data to the local branch
             self.rev_id = self.builder.commit(self.message)
             self._emit_progress_update()
-            # revision data is in the local branch now.
             
             # upload revision data to the master.
             # this will propagate merged revisions too if needed.
@@ -333,46 +323,21 @@
                 self.master_branch.repository.fetch(self.branch.repository,
                                                     revision_id=self.rev_id)
                 # now the master has the revision data
-                # 'commit' to the master first so a timeout here causes the local
-                # branch to be out of date
+                # 'commit' to the master first so a timeout here causes the
+                # local branch to be out of date
                 self.master_branch.set_last_revision_info(new_revno,
                                                           self.rev_id)
 
             # and now do the commit locally.
             self.branch.set_last_revision_info(new_revno, self.rev_id)
 
+            # Make the working tree up to date with the branch
             rev_tree = self.builder.revision_tree()
             self.work_tree.set_parent_trees([(self.rev_id, rev_tree)])
-            # now the work tree is up to date with the branch
-            
             self.reporter.completed(new_revno, self.rev_id)
-            # old style commit hooks - should be deprecated ? (obsoleted in
-            # 0.15)
-            if self.config.post_commit() is not None:
-                hooks = self.config.post_commit().split(' ')
-                # this would be nicer with twisted.python.reflect.namedAny
-                for hook in hooks:
-                    result = eval(hook + '(branch, rev_id)',
-                                  {'branch':self.branch,
-                                   'bzrlib':bzrlib,
-                                   'rev_id':self.rev_id})
-            # new style commit hooks:
-            if not self.bound_branch:
-                hook_master = self.branch
-                hook_local = None
-            else:
-                hook_master = self.master_branch
-                hook_local = self.branch
-            # With bound branches, when the master is behind the local branch,
-            # the 'old_revno' and old_revid values here are incorrect.
-            # XXX: FIXME ^. RBC 20060206
-            if self.parents:
-                old_revid = self.parents[0]
-            else:
-                old_revid = bzrlib.revision.NULL_REVISION
-            for hook in Branch.hooks['post_commit']:
-                hook(hook_local, hook_master, old_revno, old_revid, new_revno,
-                    self.rev_id)
+
+            # Process the post commit hooks, if any
+            self._process_hooks(old_revno, new_revno)
             self._emit_progress_update()
         finally:
             self._cleanup()
@@ -475,6 +440,60 @@
         self.master_branch.lock_write()
         self.master_locked = True
 
+    def _check_out_of_date_tree(self):
+        """Check that the working tree is up to date.
+
+        :return: old_revision_number,new_revision_number tuple
+        """
+        try:
+            first_tree_parent = self.work_tree.get_parent_ids()[0]
+        except IndexError:
+            # if there are no parents, treat our parent as 'None'
+            # this is so that we still consider the master branch
+            # - in a checkout scenario the tree may have no
+            # parents but the branch may do.
+            first_tree_parent = bzrlib.revision.NULL_REVISION
+        old_revno, master_last = self.master_branch.last_revision_info()
+        if master_last != first_tree_parent:
+            if master_last != bzrlib.revision.NULL_REVISION:
+                raise errors.OutOfDateTree(self.work_tree)
+        if self.branch.repository.has_revision(first_tree_parent):
+            new_revno = old_revno + 1
+        else:
+            # ghost parents never appear in revision history.
+            new_revno = 1
+        return old_revno,new_revno
+
+    def _process_hooks(self, old_revno, new_revno):
+        """Process any registered commit hooks."""
+        # old style commit hooks - should be deprecated ? (obsoleted in
+        # 0.15)
+        if self.config.post_commit() is not None:
+            hooks = self.config.post_commit().split(' ')
+            # this would be nicer with twisted.python.reflect.namedAny
+            for hook in hooks:
+                result = eval(hook + '(branch, rev_id)',
+                              {'branch':self.branch,
+                               'bzrlib':bzrlib,
+                               'rev_id':self.rev_id})
+        # new style commit hooks:
+        if not self.bound_branch:
+            hook_master = self.branch
+            hook_local = None
+        else:
+            hook_master = self.master_branch
+            hook_local = self.branch
+        # With bound branches, when the master is behind the local branch,
+        # the 'old_revno' and old_revid values here are incorrect.
+        # XXX: FIXME ^. RBC 20060206
+        if self.parents:
+            old_revid = self.parents[0]
+        else:
+            old_revid = bzrlib.revision.NULL_REVISION
+        for hook in Branch.hooks['post_commit']:
+            hook(hook_local, hook_master, old_revno, old_revid, new_revno,
+                self.rev_id)
+
     def _cleanup(self):
         """Cleanup any open locks, progress bars etc."""
         cleanups = [self._cleanup_bound_branch,

=== modified file 'bzrlib/dirstate.py'
--- a/bzrlib/dirstate.py	2007-05-31 20:29:04 +0000
+++ b/bzrlib/dirstate.py	2007-06-06 06:07:45 +0000
@@ -1792,7 +1792,6 @@
         :param ghosts: A list of the revision_ids that are ghosts at the time
             of setting.
         """ 
-        self._validate()
         # TODO: generate a list of parent indexes to preserve to save 
         # processing specific parent trees. In the common case one tree will
         # be preserved - the left most parent.
@@ -1923,7 +1922,6 @@
         self._header_state = DirState.IN_MEMORY_MODIFIED
         self._dirblock_state = DirState.IN_MEMORY_MODIFIED
         self._id_index = id_index
-        self._validate()
 
     def _sort_entries(self, entry_list):
         """Given a list of entries, sort them into the right order.

=== modified file 'bzrlib/workingtree_4.py'
--- a/bzrlib/workingtree_4.py	2007-06-01 13:42:02 +0000
+++ b/bzrlib/workingtree_4.py	2007-06-06 06:11:58 +0000
@@ -472,6 +472,7 @@
 
     @needs_read_lock
     def id2path(self, file_id):
+        "Convert a file-id to a path."
         file_id = osutils.safe_file_id(file_id)
         state = self.current_dirstate()
         entry = self._get_entry(file_id=file_id)
@@ -481,16 +482,22 @@
         return path_utf8.decode('utf8')
 
     if not osutils.supports_executable():
-        @needs_read_lock
         def is_executable(self, file_id, path=None):
+            """Test if a file is executable or not.
+
+            Note: The caller is expected to take a read-lock before calling this.
+            """
             file_id = osutils.safe_file_id(file_id)
             entry = self._get_entry(file_id=file_id, path=path)
             if entry == (None, None):
                 return False
             return entry[1][0][3]
     else:
-        @needs_read_lock
         def is_executable(self, file_id, path=None):
+            """Test if a file is executable or not.
+
+            Note: The caller is expected to take a read-lock before calling this.
+            """
             if not path:
                 file_id = osutils.safe_file_id(file_id)
                 path = self.id2path(file_id)
@@ -530,12 +537,13 @@
                 # path is missing on disk.
                 continue
 
-    @needs_read_lock
     def kind(self, file_id):
         """Return the kind of a file.
 
         This is always the actual kind that's on disk, regardless of what it
         was added as.
+
+        Note: The caller is expected to take a read-lock before calling this.
         """
         relpath = self.id2path(file_id)
         assert relpath != None, \
@@ -1060,10 +1068,8 @@
             except (errors.NoSuchRevision, errors.RevisionNotPresent):
                 revtree = None
             trees.append((revision_id, revtree))
-        self.current_dirstate()._validate()
         self.set_parent_trees(trees,
             allow_leftmost_as_ghost=allow_leftmost_as_ghost)
-        self.current_dirstate()._validate()
 
     @needs_tree_write_lock
     def set_parent_trees(self, parents_list, allow_leftmost_as_ghost=False):
@@ -1074,7 +1080,6 @@
             parent tree - i.e. a ghost.
         """
         dirstate = self.current_dirstate()
-        dirstate._validate()
         if len(parents_list) > 0:
             if not allow_leftmost_as_ghost and parents_list[0][1] is None:
                 raise errors.GhostRevisionUnusableHere(parents_list[0][0])
@@ -1090,11 +1095,8 @@
                 real_trees.append((rev_id,
                     self.branch.repository.revision_tree(None)))
                 ghosts.append(rev_id)
-        dirstate._validate()
         dirstate.set_parent_trees(real_trees, ghosts=ghosts)
-        dirstate._validate()
         self._make_dirty(reset_inventory=False)
-        dirstate._validate()
 
     def _set_root_id(self, file_id):
         """See WorkingTree.set_root_id."""
@@ -1283,7 +1285,6 @@
                          _control_files=control_files)
         wt._new_tree()
         wt.lock_tree_write()
-        wt.current_dirstate()._validate()
         try:
             if revision_id in (None, NULL_REVISION):
                 if branch.repository.supports_rich_root():
@@ -1291,7 +1292,6 @@
                 else:
                     wt._set_root_id(ROOT_ID)
                 wt.flush()
-                wt.current_dirstate()._validate()
             wt.set_last_revision(revision_id)
             wt.flush()
             basis = wt.basis_tree()