[MERGE] Branch hooks implementation with a set_rh hook.

Robert Collins robertc at robertcollins.net
Mon Jan 29 17:48:18 GMT 2007


On Mon, 2007-01-29 at 11:21 -0600, John Arbash Meinel wrote:
> Robert Collins wrote:
> > This patch adds Branch.hooks, a dictionary of hooks, containing an
> > initial set_rh hook, which is triggered when the revision history is
> > altered. This allows plugins to trigger on commit, push, pull, and
> > uncommit robustly.
> > 
> > -Rob
> > 
> > 
> 
> I like the functionality, but it seems a little incomplete.

Well thats the point of review ;).

> Shouldn't it pass the old revision history along with the new one?
> Otherwise we don't really know what, if anything, has changed.

The problem is that set_rh does not currently obtain the previous
revision history, as that would be [potentially] expensive, and isn't
really a good idea. Doing a 'delta' of what happened makes sense for
push and pull, because they have to check whats there to perform
operations on it - and thats why I'd like to have the separate push and
pull methods with nicer semantics.

> Should it be dealing in "revision-history" rather than "last-revision"?
> I thought we were trying to get rid of "revision-history", since it has
> really bad performance and disk space issues. (The mozilla conversion
> has 170k revisions in the mainline, and .bzr/branch/revision-history is
> around 9.3MB)

At the moment, you can set the entire revision history. I think that
while we support that, hooking it is a sane idea. I *also* think that we
should be moving away from this and offer hooks for the better apis as
they are added. I've tried to make the api easier to extend than the
commit hook one is/was.

> It might actually make even more sense to have it pass in the ancestry
> graph, rather than a single 'last-revision' marker. But (old_revision,
> new_revision) would let you ask the branch for the rest, and be stable
> even if we get rid of an explicit revision-history cache.
> 
> 
> ...
> 
> >      @staticmethod
> > +    def DefaultHooks():
> > +        """Return a dict of the default branch hook settings."""
> > +        return {
> > +            'set_rh':[], # invoked whenever the revision history has been set
> > +                         # with set_revision_history. The api signature is
> > +                         # (branch, revision_history), and the branch will
> > +                         # be write-locked.
> > +            }
> > +
> 
> ^- Why is DefaultHooks capitalized? Was it originally a class?

Its a static method, and I like having them visually distinct from non
static methods.

> Would it make sense that rather than expecting plugins to use
>   bzrlib.branch.Branch.hooks[hook_name].append()
> or even
>   bzrlib.branch.Branch.hooks.setdefault(hook_name,[]).append()
> 
> We would have a real api:
> 
>   bzrlib.branch.Branch.add_hook(hook_name, func)
> 
> Or maybe:
> 
>   bzrlib.branch.Branch.hooks.add(hook_name, func)
> 
> I probably prefer this last one the most. It gives us a better api than
> getting a KeyError if the hook name isn't supported, and it breaks the
> functionality out of Branch and into a Hooks() class.

We can do that, but it seemed like unnecessary overhead - I dont imagine
wanting to do policy actions on when hooks are set. The argument about
having better errors when an unknown hook is used by a plugin is the
most compelling for me. I can add this easily enough :).

> Also, the loop for hooks seems like it should be:
> 
>   for func in Branch.hooks.get('set_rh', []):
>     func(self, ...)
> 
> Just in case something or someone sets the hooks dictionary to something
> incomplete.

I dont think so. Better to error here: any code setting the dictionary
to anything other than DefaultHooks() is buggy by definition. (Thats why
I created a function to get a default set of hooks).

> It also seems like you really should do:
> 
>   for func in self.hooks.get('set_rh', []):
>     func(self, ...)
>
> This lets:
> 
> 1) Subclasses of Branch define a different set of default hooks.
> 
> 2) Instances of Branch have a different set of hooks.

I consider both of these misfeatures, but if needed they can be achieved
in other ways that wont allow for as much skew between live objects:
for 1), if its a subclass, it can just change the functions involved, or
add a global hook that checks the class type.
for 2), hooks are meant to be enabled/configured using the branch
configuration facilities: all available hooks should be used, and those
which are not wanted should be configured to just return.

> I realize you can handle the filtering when the function gets called.
> Something like:
> 
>  def callback(branch, history):
>    if not regex.match(branch.base):
>      return # Do nothing, this branch doesn't concern us

Right.

> Anyway, the #1 thing is that we need to pass the old state along with
> the new state. Because otherwise something like bzr-email can't really
> use this. Unless it assumes every time the revision history changes it
> is because of committing the last entry. (Which is wrong in the case of
> uncommit, and is incomplete for the case of push/pull).

I think we should not do this, not for this specific hook - because of
the performance implications of having to download 9.3 Mb of history [in
the mozilla case] from a remote branch.

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070130/30f9b37a/attachment.pgp 


More information about the bazaar mailing list