Thoughts on push performance
John Arbash Meinel
john at arbash-meinel.com
Wed May 21 21:47:04 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I was analyzing push on diverged branches, and I think I have some ideas to look
into. (I was going to, but ran out of time today)
1) _basic_push ends up calling into update_revisions(). update_revisions has no
concept of local versus remote, though. So it just does "graph =
self.repository.get_graph()". When what you really want is "graph =
other.repo.get_graph(self.repo)" for push, and the converse for pull. I was
thinking to factor out the code a bit and pass in a 'graph' object.
2) update_revisions is smart enough to skip the graph.heads() call if
'overwrite=True' is set. However, if you are pushing anything other than the
last revision you run into:
~ if other_last_revision == stop_revision:
~ self.set_last_revision_info(other_last_revno,
~ other_last_revision)
~ else:
~ # TODO: jam 2007-11-29 Is there a way to determine the
~ # revno without searching all of history??
~ if overwrite:
~ self.generate_revision_history(stop_revision)
~ else:
~ self.generate_revision_history(stop_revision,
~ last_rev=last_rev, other_branch=other)
Which is going to iterate the whole ancestry, and do it *over remote
get_parent_map() requests* because these functions also don't have much of a
clue about another branch, and when they *do* they don't know which is local and
which is remote.
3) I have to look closer to see if Graph.heads() is really the culprit. If it
is, then there is probably a bug in heads(). For the branches I was using it
should terminate rather quickly. However, it still should be using a Graph
object that can know which repository can give it the revisions *faster*. (See
point (1) about push versus pull).
4) I'm trying to figure out why we have this clause:
~ try:
~ target.update_revisions(self, stop_revision)
~ except errors.DivergedBranches:
~ if not overwrite:
~ raise
~ if overwrite:
~ target.set_revision_history(self.revision_history())
~ a) Why is the last step "set_revision_history()" rather than
~ "set_last_revision_info()"
~ b) Further, if you look at update_revisions we have:
~ if not overwrite:
~ heads = self.repository.get_graph().heads([stop_revision,
~ last_rev])
~ if heads == set([last_rev]):
~ # The current revision is a decendent of the target,
~ # nothing to do
~ return
~ elif heads == set([stop_revision, last_rev]):
~ # These branches have diverged
~ raise errors.DivergedBranches(self, other)
~ elif heads != set([stop_revision]):
~ raise AssertionError("invalid heads: %r" % heads)
~ if other_last_revision == stop_revision:
~ self.set_last_revision_info(other_last_revno,
~ other_last_revision)
~ else:
~ # TODO: jam 2007-11-29 Is there a way to determine the
~ # revno without searching all of history??
~ if overwrite:
~ self.generate_revision_history(stop_revision)
~ else:
~ self.generate_revision_history(stop_revision,
~ last_rev=last_rev, other_branch=other)
~ So I *think* what is happening is that if you supply --overwrite it might
~ be generating the revision history 2 times.
That is as far as I got for now. At a minumum, that last set_revision_history()
is going to be killing the performance rather than set_last_revision_info().
Though it does use 'self.revision_history()' which at least should be computed
on the local side.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkg0ikgACgkQJdeBCYSNAANHpgCdH1lOj3KfU4u2haGH6mBhX1rz
RrkAni82+qu8JhQatL0u6+g6oWjn+40J
=NASC
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list