Rev 6034: Start cleanup of all the noise. in http://bazaar.launchpad.net/~jameinel/bzr/2.4-too-much-walking-388269

John Arbash Meinel john at arbash-meinel.com
Wed Aug 17 10:30:29 UTC 2011


At http://bazaar.launchpad.net/~jameinel/bzr/2.4-too-much-walking-388269

------------------------------------------------------------
revno: 6034
revision-id: john at arbash-meinel.com-20110817102942-zek5sha73aihbp8m
parent: john at arbash-meinel.com-20110815141118-ka5ur3w8lozpc1xg
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.4-too-much-walking-388269
timestamp: Wed 2011-08-17 12:29:42 +0200
message:
  Start cleanup of all the noise.
  
  In the end, I did repeat testing of the settings vs various projects.
  Interestingly, lp:mysql is consistently 'slow' given the number of
  revisions vs gcc-linaro, however it is using revision-ids with
  a lot more entropy, and much wider (but shorter) history.
  
  More discussion about depth:
  1- clearly out, as it is universally more data and slower
  0- works ok on bzr and mysql, transmits less data overall, and in the case of gcc
     significantly so (995 vs ~1400 kB). However, you can see the quadratic
     effect clearly on the gcc-linaro history. So also clearly out.
  10- bzr has a clear minimum at this point, however, it is a local tick up for
      gcc-linaro and mysql.
  10-100 - There is some variation, possibly within the noise (even with 5 runs),
      however, they are all well bounded away from the quadratic growth, and the
      variation is probably very specific on the graph. (having 11 might include a
      subset of the graph on one query that makes it more expensive, but having
      12 would include a different subset that avoids walking extra data, etc.)
  1000- You can see that gcc-linaro is trending up significantly at this point.
        I'm guessing that the *local* re-walking of the data is probably starting
        to become an issue.
  10000- still better than 0 for gcc-linaro, but clearly high on the curve.
  
  Overall, because of *slight* improvements to bandwith, and marginal differences
  in time, I'm still thinking 100 is the right point on the curve.
-------------- next part --------------
=== modified file 'bzrlib/remote.py'
--- a/bzrlib/remote.py	2011-08-12 15:37:48 +0000
+++ b/bzrlib/remote.py	2011-08-17 10:29:42 +0000
@@ -48,7 +48,7 @@
 from bzrlib.trace import mutter, note, warning
 
 
-_DEFAULT_SEARCH_DEPTH = 100
+_DEFAULT_SEARCH_DEPTH = 0
 
 
 class _RpcHelper(object):

=== modified file 'bzrlib/vf_repository.py'
--- a/bzrlib/vf_repository.py	2011-08-12 15:36:54 +0000
+++ b/bzrlib/vf_repository.py	2011-08-17 10:29:42 +0000
@@ -2504,7 +2504,8 @@
         :param revision_ids: The start point for the search.
         :return: A set of revision ids.
         """
-        note('starting')
+        from bzrlib import remote
+        note('starting with %d' % (remote._DEFAULT_SEARCH_DEPTH,))
         t = time.time()
         ret = self._do_walk_to_common_revisions(revision_ids,
             if_present_ids=if_present_ids)



More information about the bazaar-commits mailing list