Initial push slows down because of extra locking

Mon Jun 4 23:24:45 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I was trying to debug pushing to an FTP server today with someone, and it
revealed some interesting information about our initial push code and some of
the reasons why it is slow.

First, we end up locking and unlocking the target directories several times. I
think that is because when we create a repository or branch we lock it while we
are creating it, and then unlock it, only to lock it again to push data into it.

Probably we should consider that "BranchFormat.create()" should return a
write-locked object. It also seems to be caused because
RepositoryFormat._upload_blank_content() is taking out a write lock, and copies
up some basic files, and then *unlocks* right before we start doing real work.

The overall log is fairly interesting to watch:

FTP get: /ftptest4/.bzr/branch-format
FTP mkd: /ftptest4
FTP mkd: /ftptest4
unable to understand error for path: /ftptest4: 550 /ftptest4: Failed to create.

^-- This is because our code actually tries to mkdir() 2 times. Because
BzrDir.create() has a ._make_tail() call, as well as cmd_push thinking it needs
to create the tail directory.

FTP stat: /ftptest4
FTP mkd: /ftptest4/.bzr

^- I don't know why we need to stat the directory....

created control directory in ftp://zippedmartin.dyndns.org/ftptest4/
FTP clone
FTP stat: /ftptest4/.bzr
FTP mkd: /ftptest4/.bzr/branch-lock
FTP mkd: /ftptest4/.bzr/branch-lock/pending.ke8jtci30fxh8y5aumjw.tmp
FTP put: /ftptest4/.bzr/branch-lock/pending.ke8jtci30fxh8y5aumjw.tmp/info
FTP rename: /ftptest4/.bzr/branch-lock/pending.ke8jtci30fxh8y5aumjw.tmp =>
/ftptest4/.bzr/branch-lock/held
FTP get: /ftptest4/.bzr/branch-lock/held/info
FTP get: /ftptest4/.bzr/branch-lock/held/info

^- I don't know why we need to get info 2 times. We need to do it 1 time to
make sure that *our* rename succeeded but the second time seems like a bug.

FTP put: /ftptest4/.bzr/README
FTP put: /ftptest4/.bzr/branch-format
FTP get: /ftptest4/.bzr/branch-lock/held/info
FTP rename: /ftptest4/.bzr/branch-lock/held =>
/ftptest4/.bzr/branch-lock/releasing.1tl8r2j89haf91033i9a.tmp
FTP rm: /ftptest4/.bzr/branch-lock/releasing.1tl8r2j89haf91033i9a.tmp/info
FTP rmd: /ftptest4/.bzr/branch-lock/releasing.1tl8r2j89haf91033i9a.tmp

^- This is checking that we still held the lock and then we released it. I'm a
little surprised that we didn't check the releasing.*/info file. Because
otherwise we might break someone elses lock if they broke in at just the right
time. But probably it isn't something we *need* to do. But notice that we just
created a ".bzr/*" directory, locked it, and then released the lock. I'm not
sure why we are locking for .bzr/*

FTP clone
FTP clone
FTP get: /ftptest4/.bzr/repository/format
FTP clone
FTP clone
FTP get: /.bzr/branch-format
FTP clone
FTP clone

^- This seems like we forget about the fact we just created the BzrDir, and
check to see if there is a repository there. This probably isn't a big
overhead, though.

creating repository in ftp://zippedmartin.dyndns.org/ftptest4/.bzr/.
FTP stat: /ftptest4/.bzr
FTP mkd: /ftptest4/.bzr/repository
FTP clone
FTP stat: /ftptest4/.bzr/repository
FTP mkd: /ftptest4/.bzr/repository/lock
FTP mkd: /ftptest4/.bzr/repository/lock/pending.vki8fabq5pxipch31i4c.tmp
FTP put: /ftptest4/.bzr/repository/lock/pending.vki8fabq5pxipch31i4c.tmp/info
FTP rename: /ftptest4/.bzr/repository/lock/pending.vki8fabq5pxipch31i4c.tmp =>
/ftptest4/.bzr/repository/lock/held
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP get: /ftptest4/.bzr/repository/lock/held/info

^- Again we seem to be checking that we acquired the lock 2 times.

FTP mkd: /ftptest4/.bzr/repository/revision-store
FTP mkd: /ftptest4/.bzr/repository/knits
FTP put: /ftptest4/.bzr/repository/format
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP rename: /ftptest4/.bzr/repository/lock/held =>
/ftptest4/.bzr/repository/lock/releasing.qewzuluyq5fxo0l92rx0.tmp
FTP rm: /ftptest4/.bzr/repository/lock/releasing.qewzuluyq5fxo0l92rx0.tmp/info
FTP rmd: /ftptest4/.bzr/repository/lock/releasing.qewzuluyq5fxo0l92rx0.tmp

^- And now that we have created the repository, we release the lock.

v- What is weirder, though, is that after this we start creating files. It sure
looks like we are doing this without having a repository lock.

FTP clone
FTP stat: /ftptest4/.bzr/repository
FTP get: /ftptest4/.bzr/repository/inventory.kndx
FTP get: /ftptest4/.bzr/repository/inventory.kndx
FTP put: /ftptest4/.bzr/repository/inventory.kndx
FTP put: /ftptest4/.bzr/repository/inventory.knit
FTP get: /ftptest4/.bzr/repository/revisions.kndx
FTP get: /ftptest4/.bzr/repository/revisions.kndx
FTP put: /ftptest4/.bzr/repository/revisions.kndx
FTP put: /ftptest4/.bzr/repository/revisions.knit
FTP get: /ftptest4/.bzr/repository/signatures.kndx
FTP get: /ftptest4/.bzr/repository/signatures.kndx
FTP put: /ftptest4/.bzr/repository/signatures.kndx
FTP put: /ftptest4/.bzr/repository/signatures.knit

^- Also note that we are trying to get the .kndx file 2 times, and then we put
.kndx and *then* .knit. That is certainly a no-no. Probably not a huge one (it
is because we are doing _KnitIndex(create=True) before we grab the
_KnitData(create=True)).

FTP clone
FTP stat: /ftptest4/.bzr/repository
FTP clone
FTP mkd: /ftptest4/.bzr/repository/lock/pending.0g1gzaoig6l8r337o3gf.tmp
FTP put: /ftptest4/.bzr/repository/lock/pending.0g1gzaoig6l8r337o3gf.tmp/info
FTP rename: /ftptest4/.bzr/repository/lock/pending.0g1gzaoig6l8r337o3gf.tmp =>
/ftptest4/.bzr/repository/lock/held
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP rm: /ftptest4/.bzr/repository/no-working-trees
FTP get: /ftptest4/.bzr/repository/revisions.kndx

^- Now we lock it again so that we can Fetch() revisions across.

Using fetch logic to copy between
KnitRepository('file:///Users/jameinel/dev/%2Ctmp/ftptest/.bzr/')(<RepositoryFormatKnit1>)
and
KnitRepository('ftp://zippedmartin.dyndns.org/ftptest4/.bzr/')(<RepositoryFormatKnit1>)
fetch up to rev {john at arbash-meinel.com-20070604210519-bjzgkez08035gahi}
FTP get: /ftptest4/.bzr/repository/inventory.kndx
FTP get:
/ftptest4/.bzr/repository/knits/9a/a-20070604210516-34ttvbv3l4esh7jm-1.kndx
FTP get:
/ftptest4/.bzr/repository/knits/9a/a-20070604210516-34ttvbv3l4esh7jm-1.kndx
FTP put:
/ftptest4/.bzr/repository/knits/9a/a-20070604210516-34ttvbv3l4esh7jm-1.knit
unable to understand error for path:
/ftptest4/.bzr/repository/knits/9a/a-20070604210516-34ttvbv3l4esh7jm-1.knit:
550
a-20070604210516-34ttvbv3l4esh7jm-1.knit.tmp.1180994944.171545029.2240.2137349411:
Cannot create file.
FTP mkd: /ftptest4/.bzr/repository/knits/9a
FTP put:
/ftptest4/.bzr/repository/knits/9a/a-20070604210516-34ttvbv3l4esh7jm-1.knit
FTP put:
/ftptest4/.bzr/repository/knits/9a/a-20070604210516-34ttvbv3l4esh7jm-1.kndx

^- this is all the work we need to do to create a new .knit file. It suffers a
bit because we are using "get_weave_or_empty()" which tries to get an existing
one (fails) then tries to open a new one, which for some reason does a get()
first (fails), and then it tries to create it (which fails because the parent
dir doesn't exist), so then we mkdir() and then create it again (finally success)

FTP has check: inventory.knit => /ftptest4/.bzr/repository/inventory.knit
FTP has: /ftptest4/.bzr/repository/inventory.knit
FTP appe to /ftptest4/.bzr/repository/inventory.knit
FTP appe (try 0) to /ftptest4/.bzr/repository/inventory.knit
FTP has check: inventory.kndx => /ftptest4/.bzr/repository/inventory.kndx
FTP has: /ftptest4/.bzr/repository/inventory.kndx
FTP appe to /ftptest4/.bzr/repository/inventory.kndx
FTP appe (try 0) to /ftptest4/.bzr/repository/inventory.kndx
FTP get: /ftptest4/.bzr/repository/signatures.kndx
FTP has check: revisions.knit => /ftptest4/.bzr/repository/revisions.knit
FTP has: /ftptest4/.bzr/repository/revisions.knit
FTP appe to /ftptest4/.bzr/repository/revisions.knit
FTP appe (try 0) to /ftptest4/.bzr/repository/revisions.knit
FTP has check: revisions.kndx => /ftptest4/.bzr/repository/revisions.kndx
FTP has: /ftptest4/.bzr/repository/revisions.kndx
FTP appe to /ftptest4/.bzr/repository/revisions.kndx
FTP appe (try 0) to /ftptest4/.bzr/repository/revisions.kndx
FTP get: /ftptest4/.bzr/repository/lock/held/info

^- I think the two 'appe' calls are just it being logged 2 times. But I *don't*
know why we are calling "has()" before we start appending.

v- And now that we have finished fetching revisions, we unlock.

FTP rename: /ftptest4/.bzr/repository/lock/held =>
/ftptest4/.bzr/repository/lock/releasing.7baleykbygckg66g1mfj.tmp
FTP rm: /ftptest4/.bzr/repository/lock/releasing.7baleykbygckg66g1mfj.tmp/info
FTP rmd: /ftptest4/.bzr/repository/lock/releasing.7baleykbygckg66g1mfj.tmp

v- Only to lock again right away because we want to delete the
"no-working-trees" flag.

FTP mkd: /ftptest4/.bzr/repository/lock/pending.73on4les7b222dlj7w9z.tmp
FTP put: /ftptest4/.bzr/repository/lock/pending.73on4les7b222dlj7w9z.tmp/info
FTP rename: /ftptest4/.bzr/repository/lock/pending.73on4les7b222dlj7w9z.tmp =>
/ftptest4/.bzr/repository/lock/held
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP rm: /ftptest4/.bzr/repository/no-working-trees
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP rename: /ftptest4/.bzr/repository/lock/held =>
/ftptest4/.bzr/repository/lock/releasing.dczk3u6ed0h5krp0c9jt.tmp
FTP rm: /ftptest4/.bzr/repository/lock/releasing.dczk3u6ed0h5krp0c9jt.tmp/info
FTP rmd: /ftptest4/.bzr/repository/lock/releasing.dczk3u6ed0h5krp0c9jt.tmp

^- And now that we have that, we unlock it one more time.

v- And now we start to create the branch.

creating branch <bzrlib.branch.BzrBranchFormat6 object at 0x777990> in
ftp://zippedmartin.dyndns.org/ftptest4/.bzr/
FTP stat: /ftptest4/.bzr
FTP mkd: /ftptest4/.bzr/branch
FTP clone
FTP stat: /ftptest4/.bzr/branch
FTP mkd: /ftptest4/.bzr/branch/lock
FTP mkd: /ftptest4/.bzr/branch/lock/pending.3pckv9gxd03se8es95tl.tmp
FTP put: /ftptest4/.bzr/branch/lock/pending.3pckv9gxd03se8es95tl.tmp/info
FTP rename: /ftptest4/.bzr/branch/lock/pending.3pckv9gxd03se8es95tl.tmp =>
/ftptest4/.bzr/branch/lock/held
FTP get: /ftptest4/.bzr/branch/lock/held/info
FTP get: /ftptest4/.bzr/branch/lock/held/info
FTP put: /ftptest4/.bzr/branch/format
FTP put: /ftptest4/.bzr/branch/last-revision
FTP put: /ftptest4/.bzr/branch/branch-name
FTP put: /ftptest4/.bzr/branch/branch.conf
FTP put: /ftptest4/.bzr/branch/tags
FTP get: /ftptest4/.bzr/branch/lock/held/info
FTP rename: /ftptest4/.bzr/branch/lock/held =>
/ftptest4/.bzr/branch/lock/releasing.yua003fsv1hvpaq1ocya.tmp
FTP rm: /ftptest4/.bzr/branch/lock/releasing.yua003fsv1hvpaq1ocya.tmp/info
FTP rmd: /ftptest4/.bzr/branch/lock/releasing.yua003fsv1hvpaq1ocya.tmp

^- we create the branch, populate it with the basic files, and then unlock it.

FTP clone
FTP stat: /ftptest4/.bzr/branch
FTP clone
FTP get: /ftptest4/.bzr/repository/format
FTP clone
FTP stat: /ftptest4/.bzr/repository
FTP clone
FTP clone

v- and then we lock the branch which causes us to lock the repository again

FTP mkd: /ftptest4/.bzr/repository/lock/pending.aixqtc0nbzpo2jqvy23v.tmp
FTP put: /ftptest4/.bzr/repository/lock/pending.aixqtc0nbzpo2jqvy23v.tmp/info
FTP rename: /ftptest4/.bzr/repository/lock/pending.aixqtc0nbzpo2jqvy23v.tmp =>
/ftptest4/.bzr/repository/lock/held
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP mkd: /ftptest4/.bzr/branch/lock/pending.xepym96z2svnoart3l0w.tmp
FTP put: /ftptest4/.bzr/branch/lock/pending.xepym96z2svnoart3l0w.tmp/info
FTP rename: /ftptest4/.bzr/branch/lock/pending.xepym96z2svnoart3l0w.tmp =>
/ftptest4/.bzr/branch/lock/held
FTP get: /ftptest4/.bzr/branch/lock/held/info
FTP get: /ftptest4/.bzr/branch/lock/held/info
FTP get: /ftptest4/.bzr/branch/branch.conf
FTP put: /ftptest4/.bzr/branch/last-revision

^- and we copy our information into the branch

v- And unlock it

FTP rm:
/ftptest4/.bzr/branch/tmp.last-revision.1180994996.975116014.2240.aamae28rc0
FTP get: /ftptest4/.bzr/branch/lock/held/info
FTP rename: /ftptest4/.bzr/branch/lock/held =>
/ftptest4/.bzr/branch/lock/releasing.dobagrk9pqlsegknr0kl.tmp
FTP rm: /ftptest4/.bzr/branch/lock/releasing.dobagrk9pqlsegknr0kl.tmp/info
FTP rmd: /ftptest4/.bzr/branch/lock/releasing.dobagrk9pqlsegknr0kl.tmp
FTP get: /ftptest4/.bzr/repository/lock/held/info
FTP rename: /ftptest4/.bzr/repository/lock/held =>
/ftptest4/.bzr/repository/lock/releasing.gls3dgmo1kvkrbbvdsq8.tmp
FTP rm: /ftptest4/.bzr/repository/lock/releasing.gls3dgmo1kvkrbbvdsq8.tmp/info
FTP rmd: /ftptest4/.bzr/repository/lock/releasing.gls3dgmo1kvkrbbvdsq8.tmp
opening working tree '/Users/jameinel/dev/,tmp/ftptest'
FTP clone
FTP get: /ftptest4/.bzr/branch/format
FTP clone
FTP stat: /ftptest4/.bzr/branch
FTP clone
FTP get: /ftptest4/.bzr/repository/format
FTP clone
FTP stat: /ftptest4/.bzr/repository
FTP clone
FTP clone
[ 2240] Mon 17:10:25.435 INFO: Created new branch.
return code 0

^- And I don't know what the extra get and stat calls are here.

Anyway, overall we seem to be locking the repository 4 or 5 times rather than
just 1 time. And the branch 2-3 times. There are a couple obvious things we
could do to fix this, though I'm not sure how to make our apis do exactly the
right thing. (Format.initialize() should return a locked object, and I still
think we should create the branch and repository at the same time, so we could
maintain a single lock across the whole fetch + create last-revision
information for the branch). But still we could just do it by passing a locked
Repository to the Branch so that it doesn't have to be locked yet again.

This is actually one of the problems about making our API "nice" and auto-lock
for us. We don't realize when we are locking and unlocking more often than we
really should.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZJEsJdeBCYSNAAMRAi3WAJ9yyjjtRlOOqbe6dIb0Ya8rJ0IldwCg1PsG
aFI8aBYilcMAUR3z4kyUK+o=
=h3sd
-----END PGP SIGNATURE-----