Detecting and handling service failures

David Tong scarabus at gmail.com
Sat Aug 25 00:47:17 UTC 2012


Cheers Evan. 
Yeah I looked at the cookbook but didn't find answers. This was just what I needed - thanks. 

http://scmwine.info

On Aug 24, 2012, at 11:25 AM, Evan Huus <eapache at gmail.com> wrote:

> On Fri, Aug 24, 2012 at 2:08 PM, David Tong <scarabus at gmail.com> wrote:
>> I am familiar with SMF on Solaris. In particular, when a service cannot be
>> started by SMF it is marked as being
>> in maintenance state. I'm trying to use upstart to detect and report on
>> similar conditions.
>> 
>> My understanding of the way that Upstart works is that if a service fails
>> then an event is emitted
>> indicating the failure and the service is stopped. If you don't catch the
>> event then you don't know it's failed.
>> If a user queries the status of a service they only see that it is stopped;
>> they don't see the reason.
>> Am I right in thinking that once a service is stopped the only way to
>> determine the cause is to view the system logs?
>> 
>> Now it's easy to configure upstart to run a job when another process fails:
>>   start on stopped tongo RESULT=failed
>> 
>> But as far as I can work out you would need to explicitly enumerate all the
>> jobs that you wanted to monitor -
>> or is there a wildcard option?
>>   start on stopped *ANY* RESULT=failed
> 
> I believe simply omitting the job name acts as a wildcard? (I've not
> tested this, but it ought to work if I understand correctly). So your
> stanza would be: start on stopped RESULT=failed
> 
>> What about the case where a new service is added? Obviously I also want to
>> be notified if that fails.
> 
> Would be caught by the previous stanza, assuming it works
> 
>> Specific RTFM pointers would be welcomed.
> 
> The bible of upstart is: http://upstart.ubuntu.com/cookbook/
> I don't think it answers this specific question though.
> 
> Cheers,
> Evan



More information about the upstart-devel mailing list