Automatic retries of hooks
William Reade
william.reade at canonical.com
Wed Jan 20 10:46:59 UTC 2016
On Wed, Jan 20, 2016 at 8:46 AM, Stuart Bishop <stuart.bishop at canonical.com>
wrote:
> On 20 January 2016 at 13:17, John Meinel <john at arbash-meinel.com> wrote:
>
> > There are classes of failures that a charm hook itself cannot handle. The
> > specific one Bogdan was working with is the fact that the machine itself
> is
> > getting restarted while the charm is in the middle of processing a hook.
> > There isn't any way the hook itself can handle that, unless you could
> raise
> > a very specific error that indicates you should be retried (so as it
> notices
> > its about to die, it raises the try-me-again error).
> >
> > Hooks are supposed to be idempotent regardless, aren't they? So while we
> > paper over transient bugs in them, doesn't it make the system more
> resilient
> > overall?
>
> The new update-status hook could be used to recover, as it is called
> automatically at regular intervals. If the reboot really was random,
> you would need to clear the error status first. But if it is triggered
> by the charm, it is just a case of 'reboot(now+30s);
> status_set('waiting', 'Waiting for reboot'); sys.exit(0)' and waiting
> for the update-status hook to kick in.
>
If it's triggered by the charm, it should `juju-reboot`, which will bounce
the machine after the hook is committed (or, with `--now`, do so right away
and requeue the executing hook). Regardless, from a charm's perspective,
"random" reboots will happen, as will an arbitrary number of other "random"
failures that really aren't worth a stop-the-line response.
It happens naturally if you structure your charm to have a single hook
> that does everything that needs to be done, rather than trying to
> craft individual hooks to deal with specific events.
>
Independent of everything else, *this* should *excellent* advice for
speeding up your deployments. Have you already been writing charms like
this? I'd love to hear your experiences; and, in particular, if you've
noticed any improvement in deployment speed. The theoretically achievable
speedup is vast, but the hook runner wasn't written with this approach in
mind; we might need to make a couple of small tweaks [0] to get the best
out of the approach.
Cheers
William
[0] basically, check for hook existence *before* doing all the context
setup work. It's essentially a no-brainer, but it's not quite trivial to
do, and has just never hit the top of anyone's list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20160120/a86377fc/attachment.html>
More information about the Juju-dev
mailing list