[RFC] Faster commit - more progress and call for input

Fri Jun 1 17:26:18 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> We could:
> 
> a) Require callers to lock things rather than having convenience locking
> at all. It is a bit of a pain, but really only the caller knows how long
> it is going to be using this tree before the lock should be released.
> Users *should* have been doing this already.
> 
> b) Have obviously high-level commands that auto-lock, and make it clear
> what commands are low-level and should be locked by the user. This might
> be the best place to do it.
> 
> c) Have alternative "_no_lock()" forms which can be used by internal
> code, rather than the simple calls for the rest of the api.
> 

Just to mention something here. We have a lot of tests that look like this:

    def test_add_as_id(self):
       tree = self.make_branch_and_tree('.')
       self.build_tree(['foo'])
       tree.add('foo', 'foo-id')
       tree.commit('foo')
       self.assertIs('foo-id', tree.path2id('foo'))

I'm not trying to say a specific test, but I'm trying to give an example.

To me, that looks like a fairly reasonable test, however, when you think about
it closely we are locking and unlocking that tree (causing files on disk, and
directories to be created, moved, read, moved, read, and deleted) 4 times.

Not to mention that DirState/Inventory is read multiple times during this test.

add/commit/path2id all have to take out a new lock (write or read as
appropriate) which means parsing the inventory/dirstate file. And write locks
have to create a temp dir, put a file in it with appropriate info, rename it
into place, then read back the file to make sure they succeeded in acquiring
the lock.
And for deleting, they move it out of the way, and read the file again, to make
sure someone didn't break their lock, etc.

Because we aren't testing locking, a "correct" way to write it is:
    def test_add_as_id(self):
       tree = self.make_branch_and_tree('.')
       tree.lock_write()
       self.addCleanup(tree.unlock)
       self.build_tree(['foo'])
       tree.add('foo', 'foo-id')
       tree.commit('foo')
       self.assertIs('foo-id', tree.path2id('foo'))

Or you could do:
    def test_add_as_id(self):
       tree = self.make_branch_and_tree('.')
       self.build_tree(['foo'])
       tree.lock_write()
       try:
         tree.add('foo', 'foo-id')
         tree.commit('foo')
         self.assertIs('foo-id', tree.path2id('foo'))
       finally:
         tree.unlock()

I do believe holding the lock makes the tests significantly faster. On my Mac
laptop on battery power (not the fastest place in the world) it is about 10%
overhead to not hold a lock. (WT2 seems especially bad at almost 50% overhead).

What concerns me even more, though, is that all of the tests are taking 150+ms
to complete. Which means it is taking a long time to just initialize a new
working tree.

test_make_branch_and_tree(WorkingTreeFormat4)   OK                 170ms
test_make_branch_and_tree(WorkingTreeFormat3)   OK                 157ms
test_make_branch_and_tree(WorkingTreeFormat2)   OK                 156ms
test_with_lock(WorkingTreeFormat4)   OK                 267ms
test_with_lock(WorkingTreeFormat3)   OK                 249ms
test_with_lock(WorkingTreeFormat2)   OK                 226ms
test_without_lock(WorkingTreeFormat4)   OK                 308ms
test_without_lock(WorkingTreeFormat3)   OK                 255ms
test_without_lock(WorkingTreeFormat2)   OK                 256ms

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGYEiqJdeBCYSNAAMRAnw1AJsG1qpTRgAVGLCcFMXETXceFrIF6QCfddcZ
QSJpRO3sz4klJAbmnIvscEE=
=Mvbh
-----END PGP SIGNATURE-----