[MERGE] Add Branch.set_last_revision_ex HPSS verb to accelerate push

Wed Jun 25 23:43:19 BST 2008

John Arbash Meinel wrote:
[...]
>
> AIUI, this is separating out the effects of "--overwrite" into 2
> separate flags. Which is fine.
>
> My concern is whether this is "most" of the graph walking or "all" of
> the graph walking. If you haven't gotten to "all" then we may not be a
> lot better off. Simply because we had to load those revisions locally
> already.
>
> Just something to think about.

With this change, all of the revision graph walking seems to be taken care of.

There is still some file text graph walking done by Packer._check_references,
but I've already posted about that :)

> +                if self._ensure_not_diverged(
> +                        stop_revision, last_rev, graph, other):
> +                    # stop_revision is a descendant of last_rev, but we
> aren't
> +                    # overwriting, so we're done.
> ~                     return
>
> ^- The return value from "_ensure_not_diverged" is odd.
> It seems that you return:
>
> ~  True if last_rev is a descendant
> ~  False if last_rev is in the ancestry
> ~  raise an Exception if there is divergence
>
> Which is sort of okay, but I wouldn't expect it from
> _ensure_not_diverged. Maybe just a better name?
> _check_if_descendant_or_diverged ?

That's a much better name.  Changed, thanks.

> +    def _clear_cached_state(self):
> +        super(RemoteBranch, self)._clear_cached_state()
> +        self._last_revision_info_cache = None
> +        if self._real_branch is not None:
> +            self._real_branch._clear_cached_state()
> +
>
> ^- Did you find that _last_revision_info was being called enough that
> the cache here is needed? If so, would it make more sense to move this
> up a level into the base Branch object?
>
> I know it is in Branch6 but not in Branch5, presumably because Branch5
> caches the whole revision_history. Though it doesn't seem like it would
> hurt to cache last_revision_info and removes code duplication.

Moving it into the base Branch object sounds like a good idea.  I'll do that.

> The base Branch.last_revision_info() doesn't have a @needs_read_lock,
> and it would be nice to have fewer of these, as it starts showing where
> clients are not locking properly.
>
> (As an example I just hit, "bzr diff -c 0.5.5" computes the
> revision_id_to_revno_map 2 times because the branch isn't locked.)
> Anyway, just a wish, I realize there are probably tons of tests that use
> the function without locking.

Yeah.  I think we actually want two different decorators: requires_read_lock and
takes_read_lock (and ditto for write locks).  “needs_read_lock” confusingly
sounds like the former, but actually acts like the latter.  And as you say, we
probably don't want to use “takes_read_lock” much anyway, clients ought to be
locking things themselves most of the time.

Something to tackle in a separate branch methinks...

In the meantime I've pushed last_revision_info into the Branch base class,
including the @needs_read_lock decorator.  This has the nice property that
RemoteBranch can now assume that all Branches have a _last_revision_info_cache
attribute, and so can update that cache in a _real_branch directly.  This might
avoid one or two roundtrips.

> ~     @needs_write_lock
> +    def _set_last_revision_descendant(self, revision_id, other_branch,
> +            allow_diverged=False, do_not_overwrite_descendant=True):
> +        path = self.bzrdir._path_for_remote_call(self._client)
> +        try:
>
> ^- I feel like private functions should never need @needs_write_lock.
> One reason is that we have actually had some problems with overhead in
> lock tracking (just incrementing and decrementing the counters 100,000
> times in an inner loop is overhead we can get rid of.)
> This isn't an inner loop style function, but it starts hiding when the
> caller actually needed a write lock.

Fair point.  I've moved the decorator onto generate_revision_history instead.

> Is "do_not_overwrite_descendant" better than inverting and using
> "overwrite_descendant" ? I know I try to avoid negatives in variable names.

Well, if I don't negate here I have to negate it in the implementation of
SmartServerBranchRequestSetLastRevisionEx, so from that perspective it's much of
a muchness.

That said, I'm happy to change it.  I'll call it “allow_overwrite_descendant”
though, which I think is slightly more consistent with the other flag (which is
also “allow_*”).

> Doesn't this function need a fallback when the server doesn't support
> the verb? I guess you are doing that at a higher level.
> (generate_revision_history).

Right, it's the caller's responsibility to fallback.  Both update_revisions and
generate_revision_history do that.  This method follows the same pattern as
others in this file, it just does the RPC call and returns the result after some
basic sanity checking.

> @@ -1510,6 +1546,7 @@
> ~     @needs_write_lock
> ~     def pull(self, source, overwrite=False, stop_revision=None,
> ~              **kwargs):
> +        self._clear_cached_state()
> ~         self._ensure_real()
> ~         return self._real_branch.pull(
> ~             source, overwrite=overwrite, stop_revision=stop_revision,
> @@ -1535,20 +1572,32 @@
> ~         except errors.UnknownSmartMethod:
> ~             self._ensure_real()
> ~             self._clear_cached_state()
> - -            return self._real_branch.set_last_revision_info(revno,
> revision_id)
> +            self._real_branch.set_last_revision_info(revno, revision_id)
> +            self._last_revision_info_cache = revno, revision_id
> +            return
> ~         except errors.ErrorFromSmartServer, err:
>
> ^- Doesn't this change the return value of the function? Is this being
> tested properly? I guess it always returns None in the other versions.

Right.  The methods that fallback to _real_branch/repo/bzrdir always just do
“return self._real_thing.foo(...)”, not because all methods return a value, just
because it was simple and reliable that way.  As we replace these methods with
smarter implementations it's fine to actually explicitly not return a value if
that's what the API is.

> I'm suprised you need to clear the cached state *before* you do the
> pull. But i guess if you are thunking over to a _real_branch function.

Hmm.  The thinking is that if _real_branch's state changes, then I need to make
sure the RemoteBranch's state is always invalidated.  I suppose I could just
invalidate the RemoteBranch's cache without touching the _real_branch, which
presumably already DTRT.  

> In my version of bzr, it doesn't look like "generate_revision_history"
> is being called anymore. 'update_revisions' now goes directly to
> "self.set_last_revision_info()". So I would hesitate to spend a lot of
> time improving it.
>
> In fact, I would consider just deprecating the function entirely, since
> it just duplicates "set_last_revision_info" logic.

Yeah, that now seems to be the way to go.  When I started this branch this
wasn't the case, and it's improved now.  I'll put “deprecate
generate_revision_history” on my todo list though.

> +                stop_revision = other.last_revision()
> +                if revision.is_null(stop_revision):
> +                    # if there are no commits, we're done.
> +                    return
> +            self.fetch(other, stop_revision)
>
> It seems like this is a small bug. Specifically "bzr push --overwrite"
> with a branch at the null revision will fail to force the other branch
> to the null revision.
>
> I'm not strictly concerned, as I don't think people ever *want* to do
> that, and it might be a decent security check.

Hmm.  This logic comes straight from Branch.update_revisions.

And sure enough:

/tmp$ bzr init no-history
/tmp$ bzr init a-branch
/tmp$ cd a-branch/
/tmp/a-branch$ bzr ci -m "Revision 1." --unchanged
Committing to: /tmp/a-branch/
Committed revision 1.
/tmp/a-branch$ cd ../no-history/
/tmp/no-history$ bzr push --overwrite ../a-branch/
All changes applied successfully.
bzr: ERROR: Reserved revision-id {null:}

So what I have in RemoteBranch isn't a regression, at least.  I'll file a bug,
though.

> +  try:
> +      self._set_last_revision_descendant(stop_revision, other)
> +      return
> +  except errors.UnknownSmartMethod:
> +      medium._remember_remote_is_before((1, 6))
>
> ^- I always wonder if we prefer:
>
> try:
> ~  something that might raise
> ~  more stuff
> except:
> ~  exception handling
>
> to
>
> try:
> ~  something that might raise
> except:
> ~  exception handling
> else:
> ~  more stuff
>
>
> I suppose in this case "more stuff" is just "return".

Well, if there's any ambiguity about what the except can or should catch, I
always prefer having an else block.  There's no ambiguity here.

[...]
> +
> +
> +    def test_revision_id_previous(self):
> +        backing = self.get_transport()
>
> ^- you have 2 blank lines here, should be 1
>
> +                request.execute(
> +                    '', branch_token, repo_token, revision_id, 0, 0))
> +        finally:
> +            b.unlock()
> +    def test_revision_id_present2(self):
> +        backing = self.get_transport()
>
> ^- and you need one here

Fixed.

> BB:tweak

Thanks!

-Andrew.