[MERGE] Branch hooks implementation with a set_rh hook.

Robert Collins robertc at robertcollins.net
Mon Jan 29 21:23:04 GMT 2007


On Mon, 2007-01-29 at 14:23 -0600, John Arbash Meinel wrote: 

> >> Should it be dealing in "revision-history" rather than "last-revision"?
> >> I thought we were trying to get rid of "revision-history", since it has
> >> really bad performance and disk space issues. (The mozilla conversion
> >> has 170k revisions in the mainline, and .bzr/branch/revision-history is
> >> around 9.3MB)
> > 
> > At the moment, you can set the entire revision history. I think that
> > while we support that, hooking it is a sane idea. I *also* think that we
> > should be moving away from this and offer hooks for the better apis as
> > they are added. I've tried to make the api easier to extend than the
> > commit hook one is/was.
> 
> You definitely have succeeded at a better api than Commit has. It just
> feels like we are recommending people use the wrong api. They really
> should be use Branch.last_revision() when possible, and I can see the
> hook being much more useful if it can give you delta information.

Thats true. Using last_revision() though does not need to read the
entire file - in fact jelmer and I have been looking at fixing it to not
read the entire file. I'm also wondering about generating smaller
revision-history files with deliberately truncated contents.

> For example, with the Atom feed plugin, are you just re-serializing the
> complete history each time? It seems like it might be nicer if you only
> had to serialize new information.

Its much simpler to just read the last 20 revisions from history and
make a new atom file, than to:
read the old atom file
parse it to get revision ids out
do a set difference on the last 20 revisions
read the ones we dont have
generate a new file.

Atom requires use to rewrite the header always to update the timestamp
for the feed, so theres no shortcut to avoid writing a complete file
each time anyhow.

> ...
> 
> >> ^- Why is DefaultHooks capitalized? Was it originally a class?
> > 
> > Its a static method, and I like having them visually distinct from non
> > static methods.
> 
> As near as I can tell, there is no precedence for this. And our style
> guidelines have only class names as CamelCase. Certainly all of our
> other static members don't have this visual distinction. Look at all of
> the static members in BzrDir.

Thats right. I always forget when I come back to bzr from an absence.
Its one of those things that grates on me ;).

Fixing.

> >>
> >>   bzrlib.branch.Branch.hooks.add(hook_name, func)
> >>
> >> I probably prefer this last one the most. It gives us a better api than
> >> getting a KeyError if the hook name isn't supported, and it breaks the
> >> functionality out of Branch and into a Hooks() class.
> > 
> > We can do that, but it seemed like unnecessary overhead - I dont imagine
> > wanting to do policy actions on when hooks are set. The argument about
> > having better errors when an unknown hook is used by a plugin is the
> > most compelling for me. I can add this easily enough :).
> 
> Having a public api be appending data to a list in a dictionary seems
> unclean to me. Both because the errors you get are unclear, and it seems
> very much like just exposing the internals, which could change. And
> having to do backwards compatibility for appending to a list in a
> dictionary isn't very fun.

Well, apis can change in many ways,but sure - I'll add a setter method.

> > 
> >> Also, the loop for hooks seems like it should be:
> >>
> >>   for func in Branch.hooks.get('set_rh', []):
> >>     func(self, ...)
> >>
> >> Just in case something or someone sets the hooks dictionary to something
> >> incomplete.
> > 
> > I dont think so. Better to error here: any code setting the dictionary
> > to anything other than DefaultHooks() is buggy by definition. (Thats why
> > I created a function to get a default set of hooks).
> 
> I think it is reasonable to allow people to remove a hook they added. In
> a way, hooks are very much just callbacks. And I can see wanting
> different callbacks at different times. I'm not sure that a 'hooks' api
> is the appropriate way to do callbacks in this manner, but we don't have
> another way.
> 
> I see a possibility to allow hooks as a way to do a bit of MVC. Or more,
> I see doing something like MVC as a way of implementing hooks.

I dont buy these two paragraphs. I dont think its reasonable to want to
remove hooks, its analagous to changing what methods you have subclassed
at runtime. Sure python lets you do it, but its not clear to developers
whats going on, it will be annoying to debug. Its a bad idea. hooks are
a bit like an event system - sure - in that you are essentially
subscribed to messages of a particular type. I'd say that if someone
wants to make an event system, they can layer that on top: do a hook
that maintains a list of active subscribers and sends them messages, and
allows subscription/unsubscription, filtering - the whole enchilada. 

I think thats way to complex for the core though, which should be
simple, easy to understand and debug, and easy to interact with and work
with. To that end we should avoid adding hair, or letting it be unclear
what the API is about.

Specifically, hooks are *not* regular callbacks because they are not
specific to a branch, nor is there the regular ability of callbacks to
have associated data to be handed back[other than pythons flexibility in
allowing callables to be registered].

Now we could allow per-branch-instance callbacks, but I'd do them in a
different list.
Branch.hooks and
Branch.callbacks.

We could have these have the same structure, so you do
for hook in Branch.hooks:
   hook(self)
for callback in self.callbacks:
   callback(self)

This would be clear and unambigous, and free of the race conditions that
mixing both into the same name would engender.


> > 
> >> It also seems like you really should do:
> >>
> >>   for func in self.hooks.get('set_rh', []):
> >>     func(self, ...)
> >>
> >> This lets:
> >>
> >> 1) Subclasses of Branch define a different set of default hooks.
> >>
> >> 2) Instances of Branch have a different set of hooks.
> > 
> > I consider both of these misfeatures, but if needed they can be achieved
> > in other ways that wont allow for as much skew between live objects:
> > for 1), if its a subclass, it can just change the functions involved, or
> > add a global hook that checks the class type.
> > for 2), hooks are meant to be enabled/configured using the branch
> > configuration facilities: all available hooks should be used, and those
> > which are not wanted should be configured to just return.
> 
> So wouldn't it make sense to pass along the configuration object? Or
> alternatively have a Branch cache its configuration object, rather than
> have to reparse everything over and over again for each hook?

I thought we had a mechanism to do that already within a lock context...
If we dont then lets make sure we do. We dont want set_revision_history
to have to fetch a config file thats already been read anyway.

> >> Anyway, the #1 thing is that we need to pass the old state along with
> >> the new state. Because otherwise something like bzr-email can't really
> >> use this. Unless it assumes every time the revision history changes it
> >> is because of committing the last entry. (Which is wrong in the case of
> >> uncommit, and is incomplete for the case of push/pull).
> > 
> > I think we should not do this, not for this specific hook - because of
> > the performance implications of having to download 9.3 Mb of history [in
> > the mozilla case] from a remote branch.
> > 
> > Rob
> 
> As stated earlier, I'm 90% positive that we've already downloaded those
> 9MB. So we may as well use it since there is no other way to get it.
> Unless you call this hook *before* the new revision history is set, in
> which case callers that want it can just call branch.revision_history().
> 
> If I understood correctly, at the time you are calling the hooks, the
> information has been lost. I think it is valuable for many types of
> plugins, so it would be good to have a way to get it.
> 
> I can understand you idea of having more case-specific callbacks. But
> something that just wants to report when a branch changes would now have
> to implement many callbacks (push, pull, commit, update [I think it uses
> pull], uncommit, etc).

push, pull, commit, uncommit are all that would be needed.

> And if we add new high-level functionality, you need to make sure to
> update your plugin. Rather than hooking in at the layer where stuff is
> actually happening. (I can see a benefit to this, since you know *why*
> things are happening, but there is a downside in that you can be easily
> bypassed because of a change to the high-level api).

Yes, but OTOH once revision-history is removed for a last-revision file
in a new branch format, this current hook becomes moot too. We can check
for old hooks and issue deprecation warnings and the like. I dont think
telling the future is very reliable ;).

I'd *really* rather not overhaul the internals of revision-setting to
make this patch get in, but I will if needed. I dont want to change
these guts because I understand Aaron is working on a new format that
removes revision-history anyhow, and I dont think that this particular
low level call is a 'good' one for people to be hooking into anyway:
hooking into push/pull/commit/uncommit is much better, because those
operations are more consistent over branch and repository format
changes.

For clarity:

I'm in agreement and will do:
 * Remove the CamelCase on DefaultHooks -> default_hooks.
 * a setter function something like Branch.add_hook(name, callable)
which will error on non-existing hooks with a specific exception.

I think config-performance should be solved by having the config cached
in the lock context, as revision-history and other things are, and that
this is out of scope for this patch. Passing it in from the hook-invoker
would not stop it being a performance hit over doing it right and
caching when first accessed in a lock.

I'm not in agreement on changing set_revision_history to ask for the old
revision history and pass it to the hook. I think that that is the wrong
change for this api, and we should just do the other higher level apis
quickly - i.e. for 0.15.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070130/7e008796/attachment.pgp 


More information about the bazaar mailing list