[RFC] Disabling jobs in Upstart
James Hunt
james.hunt at canonical.com
Fri Jun 17 19:42:17 UTC 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi All,
= Caveat =
This is very much a brain dump and doesn't have all the answers - please
comment and fill in the blanks when you spot them! :-)
= Introduction =
We are looking to provide the ability to fully disable a job.
= Rationale =
Lots of users are familiar with the old SysV way of handling jobs and
are looking for a chkconfig-like tool to ease the transition to Upstart.
The "manual" stanza coupled with the Override facility does already
provide this facility, but have the following shortcomings.
== Shortcomings of Override Files ==
* There is no programmatic Upstart interface: it requires a tool/user to
manually create a ".override" file contaning the "manual" stanza (or
simply appending "manual" to the ".conf" file).
* It is too generic a facility / not "fail-safe"
Any Admin/tool/pkg can manipulate ".override" files. If an Admin
disables a job using a ".override" file, they might find that it has
later been changed by another tool that rewrote the override. This is
undesirable since the job may no longer be disabled.
* Not obvious how to determine if a job *is* enabled or disabled.
It is possible though. See:
http://upstart.ubuntu.com/cookbook/#determine-if-a-job-is-disabled
= Requirement =
A "chkconfig"-like tool [1] to allow:
* Jobs to be disabled in particular runlevels.
* The ability to determine if a job is disabled for a particular
runlevel.
* The ability to determine if a job *will* run for a particular
runlevel (note: this is *NOT* the same as the bullet above!
See below...)
= Ideal =
The ideal tool would provide the following details:
* Job name.
* Instance name.
* Which runlevels a job is enabled and disabled in.
This breaks down into:
* Job is enabled for specified runlevel.
* Job is explicitly disabled for specified runlevel.
* Job is *implicitly* disabled for specified runlevel.
* Whether the job ran last time?
(would require an event+job log. Can never be 100% reliable of course
since config may have changed between boots.)
= Preliminaries =
== Thoughts and Observations ==
* It is actually rather difficult to map the Upstart event model onto such
a tool since SysV init doesn't behave like Upstart (further details below).
* If a job is explicitly disabled completely, jobs which start on that
job will be implicitly disabled. This information needs to be
conveyed somehow.
* If a job has a start on condition as below, what action should we
take if the user requests the job be disabled in runlevel 2?:
start on foo or runlevel 2
Since it is (currently) not possible to know upfront whether "foo" or
"runlevel 2" will be satisfied at boot time, it may be reasonable to
(by default) disable such a job in runlevel 2 since the "start on"
has specified it *might* "start on runlevel 2". We could provide an
option to control this subtle behaviour.
* If we provide the ability to disable any job, the system could become
unbootable very quickly.
== Constraints ==
* Upstart currently has no knowledge of SystemV runlevels: they are
supported through events and external applications such as telinit.
This premise should not need to be contravened - the internals of
Upstart should not need to be imbued with runlevel knowledge. This
implies that:
1. The facility should work for *any* event (not just runlevels).
2. The facility should be driven by an external tool of some kind (in
other words either a program or script which calls initctl as
appropriate).
* Runlevels are implemented with the "runlevel" event which has a
primary environment variable "RUNLEVEL" taking a value from 0 to 6.
It needs to be possible to disable a job:
* entirely (where it has any "start on" condition).
* in all runlevels ("[0123456]").
* in some runlevels (for example "[345]").
* Upstart allows jobs to be started based on arbitrarily complex
conditions. Any facility to disable a job should consider these
conditions.
== Categories of Jobs ==
There are a number of job categories that we need to consider:
1. Jobs that specify a start on which does *NOT* include
runlevel.
They may start before or after the runlevel event is emitted.
2. Jobs that start on the initial event.
A small handful of jobs "start on startup". This is a specialisation
of (1).
3. Jobs that "start on runlevel" (a single event).
Such jobs may restrict the start on further by specifying
environment variables (RUNLEVEL and PREVLEVEL).
4. Jobs that specify a "complex" start on (one using "and" / "or")
which includes "runlevel".
= Terminology =
* "limit"
Since we want to be able to disable Upstart jobs based on some
condition, "disable" is rather a crude term. The word "limit" is
better since it connotes the more fine-grained approach being
proposed. Its antonym being "delimit" (I'd initially thought of
"restrict" and "derestrict" but (,de)limit is shorter :-)
= Scope =
Ideally, it would be possible to disable a job *instance*. But that is
probably going to be an "iteration 2" feature.
Of the four categories of Jobs outlined above, only category (3) and (4)
can reasonably be dealt with by this design. Category (1) breaks down
into jobs that run before the runlevel event is emitted (about 20 on an
Ubuntu oneiric system currently) and jobs that run after. The former
have to be excluded but the latter may be able to be considered. It is
possible that many of those would end up being implicitly disabled if a
job in category (3) or (4) were disabled anyway [2].
It isn't reasonable to stop category (2) jobs from running since that
will almost certainly break your system anyway: mountall won't run for
starters!
= High-Level Plan =
My thoughts at this stage are that we provide 3 new commands (note these
are not *necessarily* initctl commands):
* limit <job> [<expr>]
Restrict conditions on which job <job> is started. <expr> is assumed
to be a subset of the "start on" condition of <job>, however if it
is not, this is not an error (but a warning should probably be
issued since the command would have no effect at that point in time.
QUESTION: If job <job> has already been limited, what do we do:
1. Throw an error.
2. Replace the existing limit with the new one.
QUESTION: How would we handle this scenario?:
$ restrict cron runlevel [35]
$ restrict cron runlevel RUNLEVEL=4
Possible outcomes:
1. Cron is restricted in runlevels 3+5.
1. Cron is restricted in runlevel 4.
1. Cron is restricted in runlevel 3, 4 and 5.
* delimit <job>
Returns any current limit expression and undoes the effect of
"limit".
* show-limit [<job> [<expr>]]
Show limits for all jobs or specified job.
Command should emit a warning if any limit is found that is not a
subset of the "start on" for the job in question (since the limit
will have no effect).
If no expression is supplied, show "raw" limit. If an expression
*is* specified, determine if job would run given that expression.
Example: Assume a job specifies "start on runlevel [345]". If a
limit of "runlevel RUNLEVEL=4" has been set, we want a higher-level
tool to be able to query directly if the job would run in runlevel 4
so returning "runlevel [345]" isn't that helpful. What we really
want to say is:
$ show-limit foo runlevel 4
And have the tool display whether for "runlevel 4" job foo would run
based on the limit of "runlevel [345]". This could be displayed in
parseable format and also maybe returned via the return code.
Thought: maybe we could add a "query-limit" command specifically for
this and have "show-limit" just return the "raw" limit details?
= Implementation Details =
== Limit Condition ==
To satisfy the chkconfig requirement, we could just allow a single event
and optional environment to be specified. However, the better solution
is to allow an arbitrary condition (like "start on" and "stop on"). The
condition could almost be viewed as a "restrict on" stanza. Only one
such limit condition may be specified.
XXX: Note that the condition itself -- for the example of runlevels --
cover all the runlevels where that job must not run. This is an
important point: the condition only specifies a single runlevel if that
job should only be disabled in a single runlevel. The "norm" is
probablly more likely to be where the condition covers *more than one*
runlevel. This is perfectly acceptable since "show-limit" allows an
*actual* runlevel to be specified so a higher-level tool can establish
if a job would be disabled for a particular runlevel.
== Matching Limits to Events ==
If a job condition becomes "true" such that Upstart would normally
attempt to start the job and if that job has a limit condition which
"matches" part of the EventOperator tree, Upstart will not run the job.
=== Examples ===
start on : runlevel [2345]
runlevel : 2
limit : runlevel 2
outcome : match - job will be disabled in runlevel 2.
start on : runlevel [2345]
runlevel : 2
limit : runlevel
outcome : match - job will be disabled in runlevel 2.
start on : runlevel
runlevel : 2
limit : runlevel [2345]
outcome : match - job will be disabled in runlevel 2.
start on : runlevel 2
runlevel : 2
limit : runlevel [2345]
outcome : match - job will be disabled.
start on : runlevel RUNLEVEL=2
runlevel : 2
limit : runlevel [2345]
outcome : match - job will be disabled.
start on : runlevel [2345]
runlevel : 2
limit : runlevel RUNLEVEL=2
outcome : match - job will be disabled.
start on : runlevel RUNLEVEL=2
runlevel : 2
limit : runlevel [2345]
outcome : match - job will be disabled.
start on : runlevel RUNLEVEL=2 PREVLEVEL=S
runlevel : 2
limit : runlevel [2345]
outcome : match - job will be disabled.
start on : runlevel RUNLEVEL=2
runlevel : 2
limit : runlevel [2345] S
outcome : no match - job will run.
start on : runlevel 2
runlevel : 2
limit : runlevel [345]
outcome : no match - job will run. warning will be generated since
limit cannot match the start on condition.
start on : foo or runlevel 2
runlevel : 2 (foo has not been emitted).
limit : runlevel [2345]
outcome : match? I think yes.
start on : foo and runlevel 2
runlevel : 2 (and foo has been emitted).
limit : runlevel [2345]
outcome : match - job will not run.
== Storage of Limit Conditions ==
The two main ideas here are:
* Create a single file to store all limit information.
A good location might be "/etc/init.limit". This file would store
job restriction details in a simple format such as:
<job> [<condition>]
So, if job "cron" was disabled entirely, it would contain:
cron
Whereas if the job was disabled in runlevels 3-5 it would contain:
cron runlevel [345]
If the file exists on startup, Upstart would read the job
limit details.
Pros:
* Single file outside of /etc/init/ so might be "safer" in the case
where an admin ran "cd /etc/init; rm * .override" say by mistake.
* It would be a "single point of definition" and thus easier to
backup and apply to other systems maybe?
Cons:
* File would nominally need to be rewritten each time a change was
made. Might not be too bad since changing limits is perceived as
being an irregular activity (but tell me if you have other views on
this! :)
* Possible locking issues if multiple requests came in to change a
limit at the same time.
* Create per job files
In a similar fashion to the existing ".conf" and ".override" files,
we could introduce "/etc/init/<job>.limit". If this file existed
and was empty, the job would be fully disabled (never automatically
started). However, if it contains "<condition>", that would be applied.
Pros:
* Analog to ".conf" and ".override" so familiar to users.
Cons:
* Easy to inadvertently delete a ".limit" file maybe?
* We're starting to create a lot of files now. Theoretically there
could now be 3 files / job (".conf", ".override" and ".limit").
We're not likely to reach the inotify limit (4096 watches?) yet,
but it is something to be aware of, moreso in the server or maybe
development server environment.
However the Limit Condition file(s) is/are created, care needs to be
taken to ensure that it is not possible to lose data should
the system fail / be rebooted in mid-write.
== What Writes the Limit Conditions File? ==
The entire Upstart system currently only reads files. Changing that
precedent should not be made lightly.
Do we really want init or initctl to be able to write to files? There
are a number of issues around doing so including:
* Security concerns.
Having a daemon writing files as the superuser is always something to
be highly wary of.
* Handling of failure conditions.
Particularly for init itself, writes couldn't be synchronous
since if they failed, that would block all other jobs.
* Potential data loss should the system crash / be powered off whilst
writing.
Really, this requires a transactional system. A simple, but not fully
effective semi-solution would be to write the data to a temporary
file (eeeeew!) and then atomically move that over the original.
There are 3 possibilities here:
* /sbin/init writes the file.
This is probably best avoided for the reasons outlined above. That
said, it would be the cleanest solution since the new commands could
be initctl commands and would work in similar fashion to existing ones.
* /sbin/initctl writes the file.
This is better than having init write the file, but by tasking initctl
with the job, we would need to introduce some sync point with init
such that:
* initctl write the file.
* initctl asks init to read the file and confirm when it has done so.
* initctl returns with a message to the user.
The sync point would guarantee that when initctl returns that Upstart
would "know" about the limits and would act on them. Without it,
there is a window where the user may think a job was disabled when in
fact it hadn't yet been disabled. However, that window is small and
realistically may only apply to the pathological case whereby a *job*
disables another job. If we go with the ".limit" idea whereby Upstart
would be notified by inotify as usual, the window is probably so
short that we don't have to worry (much).
* Some other tool writes the file.
This sounds like a good option, but we still have the sync issue
potentially.
= Questions =
* What about read-only root environments which would disallow writing
to /etc/init*? We could provide a "--limitfile" option to init, but
where could that point to that is guaranteed to exist potentially as
early as the initramfs running?
[1] - http://manpages.ubuntu.com/manpages/en/man8/chkconfig.8.html
[2] - We should analyse the standard Ubuntu desktop and server
installations to see how many fall into each category...
Kind regards,
James.
- --
James Hunt
____________________________________
Ubuntu Foundations Team, Canonical.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk37rg0ACgkQYBWEaHcQG9cDXwCbBS4y05k6g8DR9JJp94guQ20y
WZQAn2oggRMerD4Ob0IHKLi7kwTwIX8L
=b0qf
-----END PGP SIGNATURE-----
More information about the upstart-devel
mailing list