[MERGE/RFC] Odd processing during BzrDir.sprout()

Thu Sep 4 04:22:50 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I was just poking around with some "bzr branch" timings, and I was quite
surprised to see:

114.969  SFTP.readv(...705d.pack) 8107 offsets => 76 coalesced => 158 requests
120.421  creating branch <bzrlib.branch.BzrBranchFormat6 object at 0x8578a2c>
in file:///home/jameinel/dev/%2Ctmp/branchtim/xxx/yyy/.bzr/
126.118  created new branch
BzrBranch6('file:///home/jameinel/dev/%2Ctmp/branchtim/xxx/yyy/')
137.845  SFTP.readv(...705d.iix) 1 offsets => 1 coalesced => 1 requests
138.665  SFTP.readv(...705d.iix) 1 offsets => 1 coalesced => 1 requests
139.487  SFTP.readv(...705d.iix) 1 offsets => 1 coalesced => 1 requests

So, for some reason we are creating the target branch, and then going back and
reading the source inventory index (and pack file).

Even worse, it seems like we don't have the index information cached, which
means somewhere we let go of the repository lock. Considering we just did a
fetch, and should certainly have all of the inventory index cached.

For the most part, I tracked it down to the "subtree" code. Which is really a
shame considering 99% of all branches out there don't support subtrees anyway.
With 400ms ping time over the loopback, going into a treeless repository, this
patch drops "bzr branch" times by about 20s (out of 140s), because it doesn't
try to go read the inventory from the source repository, having to probe for
inventory info again.

The main reason for the RFC is that I'm wondering if a better fix would just be:
        if recurse == 'down' and repository.supports_tree_reference():

so we just disable this extra lookup if we know in advance that we won't *do*
anything with it.

I came across this, while working on my sftp tests, because it would seem to
finish the transfer. And then just sit around for a while, thinking, before it
actually finished the branch.

Looking more closely, I think we need to address some stuff in BzrDir.sprout().

I see it doing:
            source_branch = self.open_branch()
            source_repository = source_branch.repository

But I never see it *locking* those objects. Which means it isn't caching any
information between calls.

There is also something inherently wrong (IMO) about having "cmd_branch" do:
  br_from.bzrdir.sprout(target_url...)

and then having sprout do:
  br_from = self.open_branch()

and creating an entirely new Branch instance. (One that isn't locked, or
sharing *any* state with the branch we just used to probe the ancestry,
possibly resolve -r XXX information, etc.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIv1SKJdeBCYSNAAMRAsqXAJ90td9bj4mGP5c1JflJsRjlMzNFJACggaz1
EK+Xd4y8o/+DIVnWwKYitYI=
=iuFQ
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: references_from_target.diff
Type: text/x-diff
Size: 599 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080903/e35ae818/attachment.bin