Thoughts on push performance

John Arbash Meinel john at arbash-meinel.com
Thu May 22 15:29:59 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
| John Arbash Meinel wrote:
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> I was analyzing push on diverged branches, and I think I have some ideas to look
|> into. (I was going to, but ran out of time today)
|
| I happens that I was looking at similar stuff yesterday and today.  I've managed
| to get rid of all the get_parent_map RPC calls in at least some cases of push
| over HPSS (perhaps all, I haven't tested thoroughly yet).
|
| The branch is at <http://people.ubuntu.com/~andrew/bzr/smart-push-gpm>.  It's
| not quite ready for review, but it is ready for testing if you're interested.
| It is beating 1.4 on pushing r3390 of bzr.dev onto a remote branch at r3389 by
| nearly 20% (~59s vs ~71s), and probably even better on larger, more complex
| graphs.  (It's also twice as fast as 1.4 at raising DivergedBranches on that
| same push with slightly diverged histories.)
|
| For HPSS, we can do update_revisions considerably more simply I think.  It
| essentially boils down to:
|
|     self.fetch(other, stop_revision)
|     if overwrite:
|         self._set_last_revision(stop_revision)
|     else:
|         self._set_last_revision_descendant(stop_revision)

My big problem here is that these don't pass a revno if they have them. Because
it actually takes a long time to iterate back to the NULL revision to figure out
~ what to put into .bzr/branch/last-revision (which is actually
last-revision-info, historical reasons.)


|
| Where _set_last_revision is an existing RPC call that unconditionally updates
| the last revision, and _set_last_revision_descendant is a new RPC call
| (implemented in that branch) that updates the last revision if the new revision
| is a descendant of the current revision, and otherwise raises an error.
|

How do these interact with 'append_revisions_only' ? I'm guessing it is using
the same mechanisms as what we already have. But it is something to consider.

| So there's no graph operations client-side here at all after the fetch.  After
| the fetch the desired revisions are all on the server, so we can just ask the
| server to figure it out.
|
| This still isn't optimal.  self.fetch still takes way too long to notice that a
| remote branch has diverged from the local branch.

Are you suggesting that we just don't copy revisions until we've determine
divergence? We certainly *can* do this, though we have a specific comment:

~            # we fetch here so that we don't process data twice in the common
~            # case of having something to pull, and so that the check for
~            # already merged can operate on the just fetched graph, which will
~            # be cached in memory.
~            self.fetch(other, stop_revision)

|
|> 1) _basic_push ends up calling into update_revisions(). update_revisions has no
|> concept of local versus remote, though. So it just does "graph =
|> self.repository.get_graph()". When what you really want is "graph =
|> other.repo.get_graph(self.repo)" for push, and the converse for pull. I was
|> thinking to factor out the code a bit and pass in a 'graph' object.
|
| Yeah, that's true.  It would be good to fix this for the benefit of non-HPSS
| pushes.
|
| [....]
|> 3) I have to look closer to see if Graph.heads() is really the culprit. If it
|> is, then there is probably a bug in heads(). For the branches I was using it
|> should terminate rather quickly. However, it still should be using a Graph
|> object that can know which repository can give it the revisions *faster*. (See
|> point (1) about push versus pull).
|
| Incidentally, that branch implements a Repository.graph_heads(*revision_ids)
| RPC, although I don't think the current revision uses it.
|
|> 4) I'm trying to figure out why we have this clause:
|> ~        try:
|> ~            target.update_revisions(self, stop_revision)
|> ~        except errors.DivergedBranches:
|> ~            if not overwrite:
|> ~                raise
|> ~        if overwrite:
|> ~            target.set_revision_history(self.revision_history())
|>
|> ~  a) Why is the last step "set_revision_history()" rather than
|> ~     "set_last_revision_info()"
|
| I think that must be hysterical raisins, but I don't know the history of this
| method.
|
|> ~  b) Further, if you look at update_revisions we have:
| [...]
|> ~     So I *think* what is happening is that if you supply --overwrite it might
|> ~     be generating the revision history 2 times.
|
| It's a pretty unclear function, really.  Take a look at the implementation on
| RemoteBranch in my branch, it maybe makes more sense.  I'd certainly appreciate
| a second opinion on whether it *does* make sense :)
|
| -Andrew.

I'll try to take a look at your branch. BTW, what does 'gpm' stand for?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg1g2cACgkQJdeBCYSNAAN1UACfQZGk1FB7BY6w+nLdDtCEFAMf
glkAnROXeQiYI3U7ViNz7sn8UQA1ROz1
=VUvj
-----END PGP SIGNATURE-----



More information about the bazaar mailing list