Thoughts on push performance

John Arbash Meinel john at arbash-meinel.com
Wed May 21 22:22:20 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
| I was analyzing push on diverged branches, and I think I have some ideas
| to look
| into. (I was going to, but ran out of time today)
|
|
| 1) _basic_push ends up calling into update_revisions(). update_revisions
| has no
| concept of local versus remote, though. So it just does "graph =
| self.repository.get_graph()". When what you really want is "graph =
| other.repo.get_graph(self.repo)" for push, and the converse for pull. I was
| thinking to factor out the code a bit and pass in a 'graph' object.
|
| 2) update_revisions is smart enough to skip the graph.heads() call if
| 'overwrite=True' is set. However, if you are pushing anything other than
| the
| last revision you run into:
|
| ~            if other_last_revision == stop_revision:
| ~                self.set_last_revision_info(other_last_revno,
| ~                                            other_last_revision)
| ~            else:
| ~                # TODO: jam 2007-11-29 Is there a way to determine the
| ~                #       revno without searching all of history??
| ~                if overwrite:
| ~                    self.generate_revision_history(stop_revision)
| ~                else:
| ~                    self.generate_revision_history(stop_revision,
| ~                        last_rev=last_rev, other_branch=other)
|
| Which is going to iterate the whole ancestry, and do it *over remote
| get_parent_map() requests* because these functions also don't have much
| of a
| clue about another branch, and when they *do* they don't know which is
| local and
| which is remote.
|
|
| 3) I have to look closer to see if Graph.heads() is really the culprit.
| If it
| is, then there is probably a bug in heads(). For the branches I was
| using it
| should terminate rather quickly. However, it still should be using a Graph
| object that can know which repository can give it the revisions
| *faster*. (See
| point (1) about push versus pull).
|
|
| 4) I'm trying to figure out why we have this clause:
| ~        try:
| ~            target.update_revisions(self, stop_revision)
| ~        except errors.DivergedBranches:
| ~            if not overwrite:
| ~                raise
| ~        if overwrite:
| ~            target.set_revision_history(self.revision_history())
|
| ~  a) Why is the last step "set_revision_history()" rather than
| ~     "set_last_revision_info()"
|
| ~  b) Further, if you look at update_revisions we have:
| ~            if not overwrite:
| ~                heads = self.repository.get_graph().heads([stop_revision,
| ~                                                           last_rev])
| ~                if heads == set([last_rev]):
| ~                    # The current revision is a decendent of the target,
| ~                    # nothing to do
| ~                    return
| ~                elif heads == set([stop_revision, last_rev]):
| ~                    # These branches have diverged
| ~                    raise errors.DivergedBranches(self, other)
| ~                elif heads != set([stop_revision]):
| ~                    raise AssertionError("invalid heads: %r" % heads)
| ~            if other_last_revision == stop_revision:
| ~                self.set_last_revision_info(other_last_revno,
| ~                                            other_last_revision)
| ~            else:
| ~                # TODO: jam 2007-11-29 Is there a way to determine the
| ~                #       revno without searching all of history??
| ~                if overwrite:
| ~                    self.generate_revision_history(stop_revision)
| ~                else:
| ~                    self.generate_revision_history(stop_revision,
| ~                        last_rev=last_rev, other_branch=other)
|
| ~     So I *think* what is happening is that if you supply --overwrite
| it might
| ~     be generating the revision history 2 times.
|
|
| That is as far as I got for now. At a minumum, that last
| set_revision_history()
| is going to be killing the performance rather than
| set_last_revision_info().
| Though it does use 'self.revision_history()' which at least should be
| computed
| on the local side.
|
| John
| =:->

There is one more thing I ran into. If 'append_revision_only' is set then it does:

if cur_last_revision not in self._lefthand_history(target_revision):

Which has to build up the whole left-hand history.

It should instead use an iterator that can stop once it encounters
'cur_last_revision'. It could also use something like my find_revno code and
stop once the two sides have converged, etc.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg0kowACgkQJdeBCYSNAAPUeQCgpSjtn2+RXmXd4gwGH6Ej8H7S
njUAnRje/iDisSIEZtDuskWzRb7ltd8k
=K7TJ
-----END PGP SIGNATURE-----



More information about the bazaar mailing list