[MERGE/RFC] "bzr branch" opens the source branch twice

John Arbash Meinel john at arbash-meinel.com
Fri Nov 7 23:17:45 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> John Arbash Meinel wrote:
>> While tracking into the time spent for index work, I ran:
> 
>> bzr -Dhttp -Dindex branch http://....
> 
>> What I found rather enlightening, is that it seems we read all of the
>> -format files 2 times.
> 
>> This, along with accidentally reading the pack-names file 2x each time
>> we lock the repository, means it actually takes approx 10s just to get
>> to the point where we have opened the remote HTTP repo, and have the
>> repository locked, in preparation for the next step.
> 
> 
> I just checked, and we read the http://...pack-names file 12 times
> during "bzr branch". That seems a bit excessive to me.
> 
> John
> =:->
> 

So, looking into this, the first cause is because we do:

  accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
      from_location)
  br_from.lock_read() # so far so good
  ...

    dir = br_from.bzrdir.sprout(to_transport.base, ...)


And the very specific problem is that "br_from.bzrdir.sprout" is
sprouting from the BzrDir object, and not the Branch object. Because of
that, we don't have access to the *branch* or the *repository* that we
just opened. And inside sprout() it then calls "self.open_branch()"
which re-opens everything that we just opened.

Now, I think we have some of that, because BzrDir.sprout() is where all
of the logic for "repository_policy" etc are located. It is also where
the logic for copying nested-trees resides.

Attached is a hack-around, which allows the caller to pass in the branch
we have already loaded. Not only that, because we are smart about
locking the br_from for the lifetime of the action, it also keeps the
repository locked during that whole time.

In my testing, if I do "bzr branch http://" where the local repo already
has all the revisions, this changes the time from 20s down to 10s. If I
do "bzr branch" with data to copy, it changes from 44s down to 29s
(provided the source format is in btree :).

I don't really like having to do it this way, as it seems better to use
Branch.sprout() directly, but I don't have a great feeling about what
logic needs to be where. Obviously this isn't ready to be merged as is,
considering there are no tests for BzrDir.sprout(source_branch=XXX)

I also think we need some sort of effort test, to make sure we don't
re-open the source branch multiple times.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkUzJkACgkQJdeBCYSNAAPczACeIRlx/leMQF4jsJNdmpR9G2FU
/o8An3zn8gcgW5tMUc//vDrVSzKixDXt
=WuFX
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bzrdir_sprout_branch.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20081107/9deeff1d/attachment-0001.diff 


More information about the bazaar mailing list