Thoughts on push performance

John Arbash Meinel john at arbash-meinel.com
Wed May 21 21:47:04 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I was analyzing push on diverged branches, and I think I have some ideas to look
into. (I was going to, but ran out of time today)


1) _basic_push ends up calling into update_revisions(). update_revisions has no
concept of local versus remote, though. So it just does "graph =
self.repository.get_graph()". When what you really want is "graph =
other.repo.get_graph(self.repo)" for push, and the converse for pull. I was
thinking to factor out the code a bit and pass in a 'graph' object.

2) update_revisions is smart enough to skip the graph.heads() call if
'overwrite=True' is set. However, if you are pushing anything other than the
last revision you run into:

~            if other_last_revision == stop_revision:
~                self.set_last_revision_info(other_last_revno,
~                                            other_last_revision)
~            else:
~                # TODO: jam 2007-11-29 Is there a way to determine the
~                #       revno without searching all of history??
~                if overwrite:
~                    self.generate_revision_history(stop_revision)
~                else:
~                    self.generate_revision_history(stop_revision,
~                        last_rev=last_rev, other_branch=other)

Which is going to iterate the whole ancestry, and do it *over remote
get_parent_map() requests* because these functions also don't have much of a
clue about another branch, and when they *do* they don't know which is local and
which is remote.


3) I have to look closer to see if Graph.heads() is really the culprit. If it
is, then there is probably a bug in heads(). For the branches I was using it
should terminate rather quickly. However, it still should be using a Graph
object that can know which repository can give it the revisions *faster*. (See
point (1) about push versus pull).


4) I'm trying to figure out why we have this clause:
~        try:
~            target.update_revisions(self, stop_revision)
~        except errors.DivergedBranches:
~            if not overwrite:
~                raise
~        if overwrite:
~            target.set_revision_history(self.revision_history())

~  a) Why is the last step "set_revision_history()" rather than
~     "set_last_revision_info()"

~  b) Further, if you look at update_revisions we have:
~            if not overwrite:
~                heads = self.repository.get_graph().heads([stop_revision,
~                                                           last_rev])
~                if heads == set([last_rev]):
~                    # The current revision is a decendent of the target,
~                    # nothing to do
~                    return
~                elif heads == set([stop_revision, last_rev]):
~                    # These branches have diverged
~                    raise errors.DivergedBranches(self, other)
~                elif heads != set([stop_revision]):
~                    raise AssertionError("invalid heads: %r" % heads)
~            if other_last_revision == stop_revision:
~                self.set_last_revision_info(other_last_revno,
~                                            other_last_revision)
~            else:
~                # TODO: jam 2007-11-29 Is there a way to determine the
~                #       revno without searching all of history??
~                if overwrite:
~                    self.generate_revision_history(stop_revision)
~                else:
~                    self.generate_revision_history(stop_revision,
~                        last_rev=last_rev, other_branch=other)

~     So I *think* what is happening is that if you supply --overwrite it might
~     be generating the revision history 2 times.


That is as far as I got for now. At a minumum, that last set_revision_history()
is going to be killing the performance rather than set_last_revision_info().
Though it does use 'self.revision_history()' which at least should be computed
on the local side.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg0ikgACgkQJdeBCYSNAANHpgCdH1lOj3KfU4u2haGH6mBhX1rz
RrkAni82+qu8JhQatL0u6+g6oWjn+40J
=NASC
-----END PGP SIGNATURE-----



More information about the bazaar mailing list