Rev 6034: Start cleanup of all the noise. in http://bazaar.launchpad.net/~jameinel/bzr/2.4-too-much-walking-388269

John Arbash Meinel john at arbash-meinel.com
Wed Aug 17 10:30:55 UTC 2011


At http://bazaar.launchpad.net/~jameinel/bzr/2.4-too-much-walking-388269

------------------------------------------------------------
revno: 6034
revision-id: john at arbash-meinel.com-20110817103008-3p47hg0ox21chh81
parent: john at arbash-meinel.com-20110815141118-ka5ur3w8lozpc1xg
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.4-too-much-walking-388269
timestamp: Wed 2011-08-17 12:30:08 +0200
message:
  Start cleanup of all the noise.
  
  In the end, I did repeat testing of the settings vs various projects.
  Interestingly, lp:mysql is consistently 'slow' given the number of
  revisions vs gcc-linaro, however it is using revision-ids with
  a lot more entropy, and much wider (but shorter) history.
  
  More discussion about depth:
  1- clearly out, as it is universally more data and slower
  0- works ok on bzr and mysql, transmits less data overall, and in the case of gcc
     significantly so (995 vs ~1400 kB). However, you can see the quadratic
     effect clearly on the gcc-linaro history. So also clearly out.
  10- bzr has a clear minimum at this point, however, it is a local tick up for
      gcc-linaro and mysql.
  10-100 - There is some variation, possibly within the noise (even with 5 runs),
      however, they are all well bounded away from the quadratic growth, and the
      variation is probably very specific on the graph. (having 11 might include a
      subset of the graph on one query that makes it more expensive, but having
      12 would include a different subset that avoids walking extra data, etc.)
  1000- You can see that gcc-linaro is trending up significantly at this point.
        I'm guessing that the *local* re-walking of the data is probably starting
        to become an issue.
  10000- still better than 0 for gcc-linaro, but clearly high on the curve.
  
  Overall, because of *slight* improvements to bandwith, and marginal differences
  in time, I'm still thinking 100 is the right point on the curve.
-------------- next part --------------
=== modified file 'bzrlib/vf_repository.py'
--- a/bzrlib/vf_repository.py	2011-08-12 15:36:54 +0000
+++ b/bzrlib/vf_repository.py	2011-08-17 10:30:08 +0000
@@ -2504,18 +2504,6 @@
         :param revision_ids: The start point for the search.
         :return: A set of revision ids.
         """
-        note('starting')
-        t = time.time()
-        ret = self._do_walk_to_common_revisions(revision_ids,
-            if_present_ids=if_present_ids)
-        # ret = commands.apply_lsprofiled(',,profile.txt',
-        #     self._do_walk_to_common_revisions, revision_ids,
-        #     if_present_ids=if_present_ids)
-        note('Walking took %.3fs' % (time.time() - t))
-        sys.exit(1)
-        return ret
-
-    def _do_walk_to_common_revisions(self, revision_ids, if_present_ids=None):
         target_graph = self.target.get_graph()
         revision_ids = frozenset(revision_ids)
         if if_present_ids:



More information about the bazaar-commits mailing list