[MERGE] Branch hooks implementation with a set_rh hook.

Mon Jan 29 20:23:46 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2007-01-29 at 11:21 -0600, John Arbash Meinel wrote:
>> Robert Collins wrote:
>>> This patch adds Branch.hooks, a dictionary of hooks, containing an
>>> initial set_rh hook, which is triggered when the revision history is
>>> altered. This allows plugins to trigger on commit, push, pull, and
>>> uncommit robustly.
>>>
>>> -Rob
>>>
>>>
>> I like the functionality, but it seems a little incomplete.
> 
> Well thats the point of review ;).
> 
>> Shouldn't it pass the old revision history along with the new one?
>> Otherwise we don't really know what, if anything, has changed.
> 
> The problem is that set_rh does not currently obtain the previous
> revision history, as that would be [potentially] expensive, and isn't
> really a good idea. Doing a 'delta' of what happened makes sense for
> push and pull, because they have to check whats there to perform
> operations on it - and thats why I'd like to have the separate push and
> pull methods with nicer semantics.

Actually it is potentially expensive, but any time that you call
'set_revision_history()' you really better have already read it. At the
very least you had to do "Branch.last_revision()" to figure out what
revision the branch is at before you do push,pull,uncommit,commit,etc.

Now, it is possible that you had a Branch, locked it, did some work.
*unlocked* it, and then locked it again to do the final
'set_revision_history()'. Which means that the previous value would have
been forgotten. But I think that is broken anyway, because you shouldn't
unlock inbetween reading and writing, because someone else might write
while you weren't looking.

> 
>> Should it be dealing in "revision-history" rather than "last-revision"?
>> I thought we were trying to get rid of "revision-history", since it has
>> really bad performance and disk space issues. (The mozilla conversion
>> has 170k revisions in the mainline, and .bzr/branch/revision-history is
>> around 9.3MB)
> 
> At the moment, you can set the entire revision history. I think that
> while we support that, hooking it is a sane idea. I *also* think that we
> should be moving away from this and offer hooks for the better apis as
> they are added. I've tried to make the api easier to extend than the
> commit hook one is/was.

You definitely have succeeded at a better api than Commit has. It just
feels like we are recommending people use the wrong api. They really
should be use Branch.last_revision() when possible, and I can see the
hook being much more useful if it can give you delta information.

For example, with the Atom feed plugin, are you just re-serializing the
complete history each time? It seems like it might be nicer if you only
had to serialize new information.

...

>> ^- Why is DefaultHooks capitalized? Was it originally a class?
> 
> Its a static method, and I like having them visually distinct from non
> static methods.

As near as I can tell, there is no precedence for this. And our style
guidelines have only class names as CamelCase. Certainly all of our
other static members don't have this visual distinction. Look at all of
the static members in BzrDir.

...

>>
>>   bzrlib.branch.Branch.hooks.add(hook_name, func)
>>
>> I probably prefer this last one the most. It gives us a better api than
>> getting a KeyError if the hook name isn't supported, and it breaks the
>> functionality out of Branch and into a Hooks() class.
> 
> We can do that, but it seemed like unnecessary overhead - I dont imagine
> wanting to do policy actions on when hooks are set. The argument about
> having better errors when an unknown hook is used by a plugin is the
> most compelling for me. I can add this easily enough :).

Having a public api be appending data to a list in a dictionary seems
unclean to me. Both because the errors you get are unclear, and it seems
very much like just exposing the internals, which could change. And
having to do backwards compatibility for appending to a list in a
dictionary isn't very fun.

> 
>> Also, the loop for hooks seems like it should be:
>>
>>   for func in Branch.hooks.get('set_rh', []):
>>     func(self, ...)
>>
>> Just in case something or someone sets the hooks dictionary to something
>> incomplete.
> 
> I dont think so. Better to error here: any code setting the dictionary
> to anything other than DefaultHooks() is buggy by definition. (Thats why
> I created a function to get a default set of hooks).

I think it is reasonable to allow people to remove a hook they added. In
a way, hooks are very much just callbacks. And I can see wanting
different callbacks at different times. I'm not sure that a 'hooks' api
is the appropriate way to do callbacks in this manner, but we don't have
another way.

I see a possibility to allow hooks as a way to do a bit of MVC. Or more,
I see doing something like MVC as a way of implementing hooks.

> 
>> It also seems like you really should do:
>>
>>   for func in self.hooks.get('set_rh', []):
>>     func(self, ...)
>>
>> This lets:
>>
>> 1) Subclasses of Branch define a different set of default hooks.
>>
>> 2) Instances of Branch have a different set of hooks.
> 
> I consider both of these misfeatures, but if needed they can be achieved
> in other ways that wont allow for as much skew between live objects:
> for 1), if its a subclass, it can just change the functions involved, or
> add a global hook that checks the class type.
> for 2), hooks are meant to be enabled/configured using the branch
> configuration facilities: all available hooks should be used, and those
> which are not wanted should be configured to just return.

So wouldn't it make sense to pass along the configuration object? Or
alternatively have a Branch cache its configuration object, rather than
have to reparse everything over and over again for each hook?

...

>> Anyway, the #1 thing is that we need to pass the old state along with
>> the new state. Because otherwise something like bzr-email can't really
>> use this. Unless it assumes every time the revision history changes it
>> is because of committing the last entry. (Which is wrong in the case of
>> uncommit, and is incomplete for the case of push/pull).
> 
> I think we should not do this, not for this specific hook - because of
> the performance implications of having to download 9.3 Mb of history [in
> the mozilla case] from a remote branch.
> 
> Rob

As stated earlier, I'm 90% positive that we've already downloaded those
9MB. So we may as well use it since there is no other way to get it.
Unless you call this hook *before* the new revision history is set, in
which case callers that want it can just call branch.revision_history().

If I understood correctly, at the time you are calling the hooks, the
information has been lost. I think it is valuable for many types of
plugins, so it would be good to have a way to get it.

I can understand you idea of having more case-specific callbacks. But
something that just wants to report when a branch changes would now have
to implement many callbacks (push, pull, commit, update [I think it uses
pull], uncommit, etc).

And if we add new high-level functionality, you need to make sure to
update your plugin. Rather than hooking in at the layer where stuff is
actually happening. (I can see a benefit to this, since you know *why*
things are happening, but there is a downside in that you can be easily
bypassed because of a change to the high-level api).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFvlfRJdeBCYSNAAMRApenAJ9o3Lkexu/QwqaADNmAecFyIKHVjwCfa49H
sOXOX5uST1GWWRiLU+ppAUA=
=cVlN
-----END PGP SIGNATURE-----