Thoughts on push performance
John Arbash Meinel
john at arbash-meinel.com
Thu May 22 15:29:59 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote:
| John Arbash Meinel wrote:
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> I was analyzing push on diverged branches, and I think I have some ideas to look
|> into. (I was going to, but ran out of time today)
|
| I happens that I was looking at similar stuff yesterday and today. I've managed
| to get rid of all the get_parent_map RPC calls in at least some cases of push
| over HPSS (perhaps all, I haven't tested thoroughly yet).
|
| The branch is at <http://people.ubuntu.com/~andrew/bzr/smart-push-gpm>. It's
| not quite ready for review, but it is ready for testing if you're interested.
| It is beating 1.4 on pushing r3390 of bzr.dev onto a remote branch at r3389 by
| nearly 20% (~59s vs ~71s), and probably even better on larger, more complex
| graphs. (It's also twice as fast as 1.4 at raising DivergedBranches on that
| same push with slightly diverged histories.)
|
| For HPSS, we can do update_revisions considerably more simply I think. It
| essentially boils down to:
|
| self.fetch(other, stop_revision)
| if overwrite:
| self._set_last_revision(stop_revision)
| else:
| self._set_last_revision_descendant(stop_revision)
My big problem here is that these don't pass a revno if they have them. Because
it actually takes a long time to iterate back to the NULL revision to figure out
~ what to put into .bzr/branch/last-revision (which is actually
last-revision-info, historical reasons.)
|
| Where _set_last_revision is an existing RPC call that unconditionally updates
| the last revision, and _set_last_revision_descendant is a new RPC call
| (implemented in that branch) that updates the last revision if the new revision
| is a descendant of the current revision, and otherwise raises an error.
|
How do these interact with 'append_revisions_only' ? I'm guessing it is using
the same mechanisms as what we already have. But it is something to consider.
| So there's no graph operations client-side here at all after the fetch. After
| the fetch the desired revisions are all on the server, so we can just ask the
| server to figure it out.
|
| This still isn't optimal. self.fetch still takes way too long to notice that a
| remote branch has diverged from the local branch.
Are you suggesting that we just don't copy revisions until we've determine
divergence? We certainly *can* do this, though we have a specific comment:
~ # we fetch here so that we don't process data twice in the common
~ # case of having something to pull, and so that the check for
~ # already merged can operate on the just fetched graph, which will
~ # be cached in memory.
~ self.fetch(other, stop_revision)
|
|> 1) _basic_push ends up calling into update_revisions(). update_revisions has no
|> concept of local versus remote, though. So it just does "graph =
|> self.repository.get_graph()". When what you really want is "graph =
|> other.repo.get_graph(self.repo)" for push, and the converse for pull. I was
|> thinking to factor out the code a bit and pass in a 'graph' object.
|
| Yeah, that's true. It would be good to fix this for the benefit of non-HPSS
| pushes.
|
| [....]
|> 3) I have to look closer to see if Graph.heads() is really the culprit. If it
|> is, then there is probably a bug in heads(). For the branches I was using it
|> should terminate rather quickly. However, it still should be using a Graph
|> object that can know which repository can give it the revisions *faster*. (See
|> point (1) about push versus pull).
|
| Incidentally, that branch implements a Repository.graph_heads(*revision_ids)
| RPC, although I don't think the current revision uses it.
|
|> 4) I'm trying to figure out why we have this clause:
|> ~ try:
|> ~ target.update_revisions(self, stop_revision)
|> ~ except errors.DivergedBranches:
|> ~ if not overwrite:
|> ~ raise
|> ~ if overwrite:
|> ~ target.set_revision_history(self.revision_history())
|>
|> ~ a) Why is the last step "set_revision_history()" rather than
|> ~ "set_last_revision_info()"
|
| I think that must be hysterical raisins, but I don't know the history of this
| method.
|
|> ~ b) Further, if you look at update_revisions we have:
| [...]
|> ~ So I *think* what is happening is that if you supply --overwrite it might
|> ~ be generating the revision history 2 times.
|
| It's a pretty unclear function, really. Take a look at the implementation on
| RemoteBranch in my branch, it maybe makes more sense. I'd certainly appreciate
| a second opinion on whether it *does* make sense :)
|
| -Andrew.
I'll try to take a look at your branch. BTW, what does 'gpm' stand for?
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkg1g2cACgkQJdeBCYSNAAN1UACfQZGk1FB7BY6w+nLdDtCEFAMf
glkAnROXeQiYI3U7ViNz7sn8UQA1ROz1
=VUvj
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list