notes/plan for hpss performance work
John Arbash Meinel
john at arbash-meinel.com
Fri May 2 18:26:16 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
| Martin Pool wrote:
|> On 1 May 2008, Aaron Bentley <aaron at aaronbentley.com> wrote:
|>> Martin Pool wrote:
|>>> * Perhaps surprisingly, graph operations are not showing up as
|>>> dominant, at least in the cases we did here: pushing just one
|>>> revision, and pushing all of history.
|>> In my experience, graph operations are a significant portion of creating
|>> a branch of a new project. For example, when creating a branch of
|>> Launchpad from scratch, Bazaar takes 20 minutes to determine that it
|>> should fetch everything.
|> Are you talking about the push or pull case? Coming into an existing
|> repository or not?
|
| I am talking about "pull". Or rather I am talking about "branch", from
| a remote machine to a local machine, creating a new standalone branch.
|
|> (Maybe to avoid roundtrips trying to reproduce your exact case you could
|> send to Andrew and I a -Dhpss trace of this operation on your real
|> situation?)
|
| I can do that.
|
| Aaron
Having gone through some of the code in that area, I have a suggestion for the
cause.
If the server bzr version is <1.2 we fall back on every "get_parent_map()" call
to using "get_revision_graph()".
Looking at the code:
ancestry = self._parents_map
if ancestry is None:
~ # Repository is not locked, so there's no cache.
~ missing_revisions = set(keys)
~ ancestry = {}
...
if missing_revisions:
~ parent_map = self._get_parent_map(missing_revisions)
...
~ ancestry.update(parent_map)
I certainly can't guarantee anything here, but *if* the RemoteRepository does
not consider itself properly locked, then it won't try to cache any requests to
get_parent_map().
And *if* the upstream server is pre 1.2 it requests the full revision graph of
the remote repository for every attempt. (Even if it doesn't, I believe
remote.get_parent_map() is designed to return extra data because it assumes the
client is going to want it later, and knows enough to buffer it now.)
Certainly if I had to make 15,000 calls to get_revision_graph() I would expect
it to take an inordinate amount of bandwidth and time.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkgbTrgACgkQJdeBCYSNAANUTwCdGx5gydT2UO1fYl5iGSWekzN4
gsAAniSsunjdr50Y4111EdqmSwkOZYld
=esm5
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list