[MERGE] Add Branch.set_last_revision_ex HPSS verb to accelerate push

Fri Jun 20 20:57:34 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
| This patch adds a new verb to the smart server for updating the last
revision of
| a branch.  It has the rather uninspired name of
“Branch.set_last_revision_ex”.
| Suggestions for a better name are welcome ;)
|
| This verb has a couple of flags that make it more useful than the existing
| set_last_revision_info verb.  It allows the client to ask the server
to check if
| the new tip has diverged from the old tip, and also to tell the server
whether
| or not to update the tip if the new tip is actually an ancestor rather
than a
| descendant.  This makes Branch.update_revisions much simpler for
RemoteBranch,
| and in particular allows the client to completely delegate a lot of graph
| walking to the server.

AIUI, this is separating out the effects of "--overwrite" into 2
separate flags. Which is fine.

My concern is whether this is "most" of the graph walking or "all" of
the graph walking. If you haven't gotten to "all" then we may not be a
lot better off. Simply because we had to load those revisions locally
already.

Just something to think about.

|
| The new verb also returns the resulting revno, revision_id pair so the
client
| doesn't have to calculate the revno itself (after all, the remote side
must have
| already figured out the revno in order to do
branch.set_last_revision_info()).
|
| The result is a modest improvement in push performance.
|
| The tests for the server-side are a bit messy and incomplete, and have an
| annoying amount of duplication with some existing verbs.  I'm working
on that :)
|
| This patch is based off the “Add Remote-v2 variant ...” patch I submitted
| yesterday.  It doesn't strictly depend on it, although the test
coverage is
| better with it.
|
| Reviews, comments, and benchmarking welcome :)
|
| -Andrew.
|
|

+                if self._ensure_not_diverged(
+                        stop_revision, last_rev, graph, other):
+                    # stop_revision is a descendant of last_rev, but we
aren't
+                    # overwriting, so we're done.
~                     return

^- The return value from "_ensure_not_diverged" is odd.
It seems that you return:

~  True if last_rev is a descendant
~  False if last_rev is in the ancestry
~  raise an Exception if there is divergence

Which is sort of okay, but I wouldn't expect it from
_ensure_not_diverged. Maybe just a better name?
_check_if_descendant_or_diverged ?


+    def _clear_cached_state(self):
+        super(RemoteBranch, self)._clear_cached_state()
+        self._last_revision_info_cache = None
+        if self._real_branch is not None:
+            self._real_branch._clear_cached_state()
+

^- Did you find that _last_revision_info was being called enough that
the cache here is needed? If so, would it make more sense to move this
up a level into the base Branch object?

I know it is in Branch6 but not in Branch5, presumably because Branch5
caches the whole revision_history. Though it doesn't seem like it would
hurt to cache last_revision_info and removes code duplication.


The base Branch.last_revision_info() doesn't have a @needs_read_lock,
and it would be nice to have fewer of these, as it starts showing where
clients are not locking properly.

(As an example I just hit, "bzr diff -c 0.5.5" computes the
revision_id_to_revno_map 2 times because the branch isn't locked.)
Anyway, just a wish, I realize there are probably tons of tests that use
the function without locking.

~     @needs_write_lock
+    def _set_last_revision_descendant(self, revision_id, other_branch,
+            allow_diverged=False, do_not_overwrite_descendant=True):
+        path = self.bzrdir._path_for_remote_call(self._client)
+        try:

^- I feel like private functions should never need @needs_write_lock.
One reason is that we have actually had some problems with overhead in
lock tracking (just incrementing and decrementing the counters 100,000
times in an inner loop is overhead we can get rid of.)
This isn't an inner loop style function, but it starts hiding when the
caller actually needed a write lock.


Is "do_not_overwrite_descendant" better than inverting and using
"overwrite_descendant" ? I know I try to avoid negatives in variable names.

Doesn't this function need a fallback when the server doesn't support
the verb? I guess you are doing that at a higher level.
(generate_revision_history).


@@ -1510,6 +1546,7 @@
~     @needs_write_lock
~     def pull(self, source, overwrite=False, stop_revision=None,
~              **kwargs):
+        self._clear_cached_state()
~         self._ensure_real()
~         return self._real_branch.pull(
~             source, overwrite=overwrite, stop_revision=stop_revision,
@@ -1535,20 +1572,32 @@
~         except errors.UnknownSmartMethod:
~             self._ensure_real()
~             self._clear_cached_state()
- -            return self._real_branch.set_last_revision_info(revno,
revision_id)
+            self._real_branch.set_last_revision_info(revno, revision_id)
+            self._last_revision_info_cache = revno, revision_id
+            return
~         except errors.ErrorFromSmartServer, err:

^- Doesn't this change the return value of the function? Is this being
tested properly? I guess it always returns None in the other versions.

I'm suprised you need to clear the cached state *before* you do the
pull. But i guess if you are thunking over to a _real_branch function.


In my version of bzr, it doesn't look like "generate_revision_history"
is being called anymore. 'update_revisions' now goes directly to
"self.set_last_revision_info()". So I would hesitate to spend a lot of
time improving it.

In fact, I would consider just deprecating the function entirely, since
it just duplicates "set_last_revision_info" logic.

+            if stop_revision is None:
+                stop_revision = other.last_revision()
+                if revision.is_null(stop_revision):
+                    # if there are no commits, we're done.
+                    return
+            self.fetch(other, stop_revision)

It seems like this is a small bug. Specifically "bzr push --overwrite"
with a branch at the null revision will fail to force the other branch
to the null revision.

I'm not strictly concerned, as I don't think people ever *want* to do
that, and it might be a decent security check.



+  try:
+      self._set_last_revision_descendant(stop_revision, other)
+      return
+  except errors.UnknownSmartMethod:
+      medium._remember_remote_is_before((1, 6))

^- I always wonder if we prefer:

try:
~  something that might raise
~  more stuff
except:
~  exception handling

to

try:
~  something that might raise
except:
~  exception handling
else:
~  more stuff


I suppose in this case "more stuff" is just "return".


The diff is really weird for test_smart.py. It is one of the few times
that I've seen PatienceDiff not sync up on the obvious parts like (def
test_revision_id_present2. I guess because of the amount of straight
copying between the two tests.

+        finally:
+            tree.branch.unlock()
+
+
+    def test_revision_id_previous(self):
+        backing = self.get_transport()

^- you have 2 blank lines here, should be 1

+                request.execute(
+                    '', branch_token, repo_token, revision_id, 0, 0))
+        finally:
+            b.unlock()
+    def test_revision_id_present2(self):
+        backing = self.get_transport()

^- and you need one here


BB:tweak

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhcC64ACgkQJdeBCYSNAAOeeQCfaf2vI/xXhC2ivbM0qVzZ+t2R
mCoAn3jrYjEyN6VNaodBCqRdmnVtTXNR
=PVP2
-----END PGP SIGNATURE-----