why `bzr pull bundle` so slow?

John Arbash Meinel john at arbash-meinel.com
Thu May 3 17:13:31 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
...

> We also have benchmarks for this.
> 
> bzr selftest --benchmark --lsprof-timed \
>   few_files_small_tree_100_revision
> 
> Might be worth looking at.


What I'm seeing is:

         102            0     95.2405      0.5558
bzrlib.bundle.bundle_data:196(_validate_references_from_repository)
      +25959            0     19.7612      0.5324
+bzrlib.decorators:124(read_locked)
       +5151            0     33.3260      0.1413
+bzrlib.testament:92(from_revision)
       +5151            0     41.5084      0.1075
+bzrlib.bundle.serializer.v09:62(_testament_sha1_from_revision)

(sorry about the wrapping)

Which tells me:

1) We seem to be calling read_locked() a bit to often. (26,000 calls
seems excessive). Looking at the code, I'm pretty sure we are holding
the repository lock the whole time. 'install_bundle()' takes out a
repository.lock_write() for the duration.



AHA!!! I think I found it. As near as I can tell,

BundleInfo.revision_tree() validates *all* revisions. And
'install_bundle' calls it 1 time for *each* revision.

So I *think* what we can do is:

=== modified file 'bzrlib/bundle/apply_bundle.py'
- --- bzrlib/bundle/apply_bundle.py       2007-02-09 15:56:49 +0000
+++ bzrlib/bundle/apply_bundle.py       2007-05-03 16:10:47 +0000
@@ -29,13 +29,9 @@
     repository.lock_write()
     try:
         real_revisions = bundle_reader.real_revisions
- -        for i, revision in enumerate(reversed(real_revisions)):
- -            pb.update("Install revisions",i, len(real_revisions))
- -            if repository.has_revision(revision.revision_id):
- -                continue
- -            cset_tree = bundle_reader.revision_tree(repository,
- -
revision.revision_id)
- -            install_revision(repository, revision, cset_tree)
+        last_revision = real_revisions[-1]
+        cset_tree = bundle_reader.revision_tree(repository,
+                                                last_revision.revision_id)
     finally:
         repository.unlock()
         pb.finished()


This doesn't seem to pass all tests, so I need to look at it a bit more.
But something like that should actually make it *much* faster.

Either that or change _validate_revisions to only validate the current
revision....

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGOgorJdeBCYSNAAMRAoWxAKCA6h0yAv19y9GO8dfnNccxT0iRWQCePQeT
ZVPCftmV9PaTWxtn1yfiVSo=
=j3H2
-----END PGP SIGNATURE-----



More information about the bazaar mailing list