start/stop hook guarantees

William Reade william.reade at
Tue Dec 6 15:50:28 UTC 2011

On Sat, 2011-12-03 at 15:01 -0200, Gustavo Niemeyer wrote:
> This all makes sense to me, William. Thanks for the write up and the heads up.

Sadly, it's started to make much less sense to me as I delve deeper into
the state restoration work. As I understand it, the intent is to allow
us to smoothly transition back into a "started" state, and that doesn't
sound like a bad goal in itself. However, consider the states the unit
workflow can be in:

* None

We don't want to explicitly run the start hook here; the service hasn't
even been installed, and the normal process of starting the unit agent
will lead us through "installed" to "started" regardless.

* installed

As above; normal startup will transition us to "started" anyway.

* install_error

The chances of "start" working correctly are minimal; and, if it doesn't
work, what should we do anyway? Switch to "start_error", and obscure the
real cause of the failure?

* started

I guess it can't hurt, in the case of a charm that doesn't use upstart
or otherwise monitor itself.

* start_error

May as well retry, I suppose (but I'm not sure what justification we
have for believing the result to be any different, or why this case is
special enough to overrule our preference for requiring explicit user
action to resolve error states).

* configure_error

Whether it works or not, a transition to "started" or "start_error" is
going to be profoundly misleading.

* charm_upgrade_error

Definitely a Bad Thing; we'll be breaking the guarantee that the
upgrade-charm hook will be the *first* one called after the charm
upgrade operation.

* stopped

Based on IRC discussion today, "stopped" should mean "the unit has gone
away and is never coming back" [0], and so if by some freak occurrence
we *do* restart a machine, and the unit agent comes up "stopped", we
definitely don't want to start it again.

* stop_error

As above; we can't do anything meaningful from this state, and starting
from this state is actively wrong. Assuming we still want to enable the weakly-written charms
discussed previously, I think it makes much more sense to offer a *much*
more limited guarantee; that, on the first run after reboot, the "start"
hook will be called again if the unit is in a "started" state.

The "start" hook may of course be called as a result of the unit
starting off in None or "installed", but that'd happen anyway, so it
doesn't need explicit mention.

Does this make sense?


More information about the Juju mailing list