Thoughts on push performance
Andrew Bennetts
andrew at canonical.com
Thu May 22 09:23:06 BST 2008
John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I was analyzing push on diverged branches, and I think I have some ideas to look
> into. (I was going to, but ran out of time today)
I happens that I was looking at similar stuff yesterday and today. I've managed
to get rid of all the get_parent_map RPC calls in at least some cases of push
over HPSS (perhaps all, I haven't tested thoroughly yet).
The branch is at <http://people.ubuntu.com/~andrew/bzr/smart-push-gpm>. It's
not quite ready for review, but it is ready for testing if you're interested.
It is beating 1.4 on pushing r3390 of bzr.dev onto a remote branch at r3389 by
nearly 20% (~59s vs ~71s), and probably even better on larger, more complex
graphs. (It's also twice as fast as 1.4 at raising DivergedBranches on that
same push with slightly diverged histories.)
For HPSS, we can do update_revisions considerably more simply I think. It
essentially boils down to:
self.fetch(other, stop_revision)
if overwrite:
self._set_last_revision(stop_revision)
else:
self._set_last_revision_descendant(stop_revision)
Where _set_last_revision is an existing RPC call that unconditionally updates
the last revision, and _set_last_revision_descendant is a new RPC call
(implemented in that branch) that updates the last revision if the new revision
is a descendant of the current revision, and otherwise raises an error.
So there's no graph operations client-side here at all after the fetch. After
the fetch the desired revisions are all on the server, so we can just ask the
server to figure it out.
This still isn't optimal. self.fetch still takes way too long to notice that a
remote branch has diverged from the local branch.
> 1) _basic_push ends up calling into update_revisions(). update_revisions has no
> concept of local versus remote, though. So it just does "graph =
> self.repository.get_graph()". When what you really want is "graph =
> other.repo.get_graph(self.repo)" for push, and the converse for pull. I was
> thinking to factor out the code a bit and pass in a 'graph' object.
Yeah, that's true. It would be good to fix this for the benefit of non-HPSS
pushes.
[....]
> 3) I have to look closer to see if Graph.heads() is really the culprit. If it
> is, then there is probably a bug in heads(). For the branches I was using it
> should terminate rather quickly. However, it still should be using a Graph
> object that can know which repository can give it the revisions *faster*. (See
> point (1) about push versus pull).
Incidentally, that branch implements a Repository.graph_heads(*revision_ids)
RPC, although I don't think the current revision uses it.
> 4) I'm trying to figure out why we have this clause:
> ~ try:
> ~ target.update_revisions(self, stop_revision)
> ~ except errors.DivergedBranches:
> ~ if not overwrite:
> ~ raise
> ~ if overwrite:
> ~ target.set_revision_history(self.revision_history())
>
> ~ a) Why is the last step "set_revision_history()" rather than
> ~ "set_last_revision_info()"
I think that must be hysterical raisins, but I don't know the history of this
method.
> ~ b) Further, if you look at update_revisions we have:
[...]
> ~ So I *think* what is happening is that if you supply --overwrite it might
> ~ be generating the revision history 2 times.
It's a pretty unclear function, really. Take a look at the implementation on
RemoteBranch in my branch, it maybe makes more sense. I'd certainly appreciate
a second opinion on whether it *does* make sense :)
-Andrew.
More information about the bazaar
mailing list