Upstart plans
James Hunt
james.hunt at ubuntu.com
Thu Aug 4 19:01:47 UTC 2011
Hi All,
Last week, Scott, Steve Langasek (hotfoot from DebConf11!) and myself met to discuss future plans
for Upstart and work on some new feature ideas. This mail summarises those discussions.
Table of Contents
=================
1 Day 1
1.1 General Catchup
1.2 Upstart in Chromium OS
1.2.1 Boot Overview
1.2.2 Runlevels
1.2.3 Failsafe Mode
1.2.4 ACTIONS
1.3 NIH issues
1.3.1 bug 776532 (nih_dir_walk_scan passes incorrect value to file filter)
1.3.2 bug 803587 (nih-dbus-tool generates invalid c code for structure types)
2 Day 2
2.1 Logging of job output (and maybe events) (bug 328881)
2.1.1 ACTIONS
2.2 State-passing on re-exec (bug 348455)
2.2.1 ACTIONS
2.3 Ptrace limitations (bug 406397, for example)
2.3.1 Stage 1 - track exits, not forks
2.3.2 Stage 2 - track all pids
2.3.3 Stage 3 - introduce cgroups
2.3.4 ACTIONS
3 Day 3
3.1 Upstart in Debian
3.1.1 Version of Upstart
3.1.2 insserv/startpart needs to have knowledge of Upstart.
3.1.3 Debian policy needs to be updated to allow for alternative init systems to SystemV.
3.2 "expect stanza" enhancements.
3.2.1 ACTIONS
3.3 "and" issues (bug 447654)
3.3.1 ACTIONS
3.4 Environment Variables
3.4.1 ACTIONS
3.5 Upstart resource site
3.5.1 ACTIONS
1 Day 1
--------
1.1 General Catchup
====================
1.2 Upstart in Chromium OS
===========================
This was an extremely interesting discussion to understand Chromium OS's
use of Upstart. Chromium OS has very strict boot time-allocations for
different parts of the system (kernel, X11, etc) so getting an
insight into such a specialized boot was highly instructive.
1.2.1 Boot Overview
~~~~~~~~~~~~~~~~~~~~
1) Event "startup" is emitted.
2) Most critical-path jobs start ("start on startup"):
2.1) Job "udev" runs.
2.2) Job "startup" starts.
The startup job launches critical-path services and handles such
things as mounting devices.
3) Job "boot-services" starts ("start on stopped startup").
This job starts all services used directly or indirectly by the UI
such as:
- syslog
- dbus
- wifi
- X11
All of these services are running before "started boot-services" is
emitted.
Once event "started boot-services" has been emitted, the user can login.
4) Job "system-services" runs ("starts on started boot-services")
Abstract job that runs for life of the boot. All other services are
started from this job via "start on starting system-services". Note
that these services are the lowest priority and are started *after*
the login screen is displayed. Examples:
- cron
- update manager
- power management
- crash reporter
1.2.2 Runlevels
~~~~~~~~~~~~~~~~
An interesting observation is that Chromium OS has dispensed entirely with the
legacy SystemV runlevels. This is perfectly natural since the services
run on Chromium OS are strictly controlled and no legacy init system
support is thus required.
1.2.3 Failsafe Mode
~~~~~~~~~~~~~~~~~~~~
Additional to this, Chromium OS has a novel "failsafe" method for ensuring
that certain key services are guaranteed to start even if the login
screen fails to display for some reasons. Examples of such jobs that are
started in failsafe mode are VT consoles and the ssh server. The
failsafe facility is implemented as 2 jobs:
- failsafe.conf
- failsafe-delay.conf
The failsafe-delay job specifies "start on started boot-services" such
that it starts early on in the boot. This job simply sleeps for 30
seconds (which is larger than the work-case overall boot time). Once
failsafe-delay stops, even if the main boot fails, failsafe starts since
it specifies, "start on starting system-services or stopped
failsafe-delay".
Jobs that absolutely must start even if the boot fails for some reason
can then specify "start on started failsafe" and be assured of starting,
in the worst case scenario after a 30 second delay.
1.2.4 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: See if Ubuntu could benefit from any of these techniques.
1.3 NIH issues
===============
[https://bugs.launchpad.net/libnih]
As more use is being made of the NIH Utility Library outside of core
Upstart, a few issues have been found recently. We worked through
two of the most interesting we discussed were:
1.3.1 bug 776532 (nih_dir_walk_scan passes incorrect value to file filter)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Required for as-yet-unreleased "upstart-file-bridge". Found a
solution to this issue and documented.
* ACTIONS
+ [jhunt]: implement with tests.
1.3.2 bug 803587 (nih-dbus-tool generates invalid c code for structure types)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Found through development of feature to disable jobs in Upstart.
Scott identified the fix for this.
* ACTIONS
+ [jhunt]: implement and write tests.
2 Day 2
--------
2.1 Logging of job output (and maybe events) (bug 328881)
==========================================================
Pair programming to write minimal logging functionality for jobs. I
now have the code working (90% complete for system jobs, excluding
tests) such that we will soon introduce a "console log" option
(which will be the new default for jobs). Once directory
"/var/log/upstart/" becomes available, all output from jobs
specifying explicitly or implicitly "console log" will have their
stdout and stderr redirected to a file in this directory. Currently,
the code is buffering output, so probably makes sense to adopt a pty
approach. A complication is logging of user jobs (for a variety of
reasons).
This feature will be a huge aid to debugging and a boon for
Administrators.
Could utilize same technique to log events, although need to assess
performance impact of this due to numbers of events being emitted.
2.1.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: Finish code and write tests.
2.2 State-passing on re-exec (bug 348455)
==========================================
In Ubuntu, Upstart has the ability to re-exec itself to allow the
currently running /sbin/init to be replaced cleanly. However, the
re-exec is currently stateless such that the newly exec'ed image has
no knowledge of jobs that were running before the exec. A simple
facility is used for such state-passing between the initramfs
(non-Upstart) and the main system, but the method employed is not
ideal.
We discussed a plan to introduce full state-passing between the old
and new instances which can be summarized succinctly as:
- create a pipe.
- fork.
- child creates socket and listens on it.
- child passes details of socket back to parent via pipe
(or could just use well-known location).
- child closes pipe.
- parent re-execs itself (closing pipe), passing a cmdline option to notify
init to read from the socket.
- child sends meta-data on existing jobs through pipe.
- parent parses meta-data and initializes data structures based on this info.
Plan currently is to use JSON for structured representation of
meta-data.
Perceived issues:
- We cannot restore D-Bus connections.
- New version of init being exec'ed must understand all historical
JSON syntax quirks if we ever change how we represent objects.
- Child must send its version to the re-exec'ed parent and if that
parent detects the child is newer than it, state passing would
be usafe since this scenario is indicative of downgrading the
Upstart version. In such instances, the best course of action
may be to:
- generate a warning
- log the childs state to a file
- re-exec with no state-passing.
2.2.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: implement.
2.3 Ptrace limitations (bug 406397, for example)
=================================================
Lots of discussion resulted in the basics of a plan for how to
overcome the ptrace issues.
ptrace(2) is the primary facility Upstart uses to track pids. This
provides a clever and elegant solution to a difficult problem, but
there are now known issues with its use. As such, we discussed
available alternatives such as cgroups and the proc connector.
Unfortunately, the latter suffers from too many limitations right
now (inability to work in a container being an important
restriction), so we plan to introduce support for cgroups. However,
the introduction of cgroups will be staged:
2.3.1 Stage 1 - track exits, not forks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change Upstart to track processes calling one of the various
exit() calls.
The advantage of tracking exits being that that it's a better
indication of daemonization since the exit() always follows the
fork(), and it allows for services whose parent actually stays
running until the child is ready, as well as those that the parent
dies immediately.
Stage 1 Tasks:
* Update pid tracking code to use PTRACE_O_TRACEEXIT rather than PTRACE_O_TRACEFORK.
* Introduce "expect exit <n>" syntax.
2.3.2 Stage 2 - track all pids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Upstart current tracks only 1 pid / job process type (main,
pre-start, etc). By changing it to track *all* pids for all
process types, we minimize problems relating to "unexpected" pids.
It also lays the foundation for the transition to cgroups.
Stage 2 Tasks:
* Make JobClass->process array a linked list of "Process" objects
to allow Upstart to track all pids a process type has been known as.
* Add ProcessType entry to Process object since can't use index of
array any more.
* Add entry for PROCESS_CUSTOM to ProcessType.
* Add "char *" name to Process to store name of process type.
* We can now hanle >1 pids per job process. This fixes a lot of
problems. Also paves the way for custom actions (bug 94873)
since we can simply add a new entry to the linked list for a
new process type "foo" / "monitor", etc.
* Change ptrace from being "on fork" to be on "first pid exiting" (count exits, not forks).
2.3.3 Stage 3 - introduce cgroups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stage 3 Tasks:
* Add "char *" representing the cgroup alongside the pid_t in linked list of pids
such that we can track the cgroups.
Cgroups for Upstart will be something like:
/upstart/$job/$instance/$process
job_process_spawn() will then write $pid to:
/upstart/$job/$instance/$process/tasks
Examples:
/upstart/apache/main
/upstart/network-interface/eth0/pre-start
/upstart/network-interface/eth0/post-stop
The "release_agent", which gets called automatically when the
cgroup is empty (indicating that the processes associated with a
job have exited) will probably be a symbolic link to a new
initctl(8) command "release" (itself a link to /sbin/initctl).
Once cgroups have been introduced to Upstart, job lookup will
occur by cgroup, rather than by pid. However, it is possible
that ptrace will still be used in combination with cgroups,
creating a hybrid system for maximum flexibility.
To stop an job process, Upstart will read the pids
from the tasks file and then kill those pids.
2.3.4 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: implement the plan.
3 Day 3
--------
3.1 Upstart in Debian
======================
3.1.1 Version of Upstart
~~~~~~~~~~~~~~~~~~~~~~~~~
Upstart in Debian is very old (0.6.6-2).
Needs to be updated and kept in sync with upstream. Ideally, The
minor Ubuntu differences between it and upstream need to be
propagated into Debian such that there is no delta between Debian
and Ubuntu.
Scott and Steve are able to sponsor James' uploads.
3.1.2 insserv/startpart needs to have knowledge of Upstart.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Needs to emit "starting" and "started" events for each service it runs.
* ACTIONS:
+ [slangasek]: patch startpar to understand the status of upstart jobs.
+ [slangasek]: fix debhelper so upstart jobs and init scripts can be installed side-by-side in
Debian.
+ [slangasek]: get an up-to-date upstart into Debian unstable.
3.1.3 Debian policy needs to be updated to allow for alternative init systems to SystemV.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3.2 "expect stanza" enhancements.
==================================
We observed that currently the "start on" and "stop on" stanzas take
an event condition, whereas the "expect" stanza takes one of two
"special" tokens representing the number of times the process is
expected to fork(2). Note too that "start on" is handled internally
by Upstart *and* in some instances also by out-of-process bridges
(primarily the upstart-socket-bridge).
We also discussed that forks are merely one way
for a job to indicate it is "ready" and in fact for database
servers, the fact that the primary daemon is running is no
indication that the server is available. It may require significant
time to replay logs before it is able to start servicing client
requests. Other possible indications of readiness could be:
- The creation of a file (log file, lock file, etc).
- The appearance of a service on the D-Bus system bus.
- The creation of a listening socket of a particular type on a particular port.
- Theoretically, a process emitting an event to say "I'm now ready".
This last one is the most intriguing one in that it triggers an
interesting thought: what if "expect", rather than taking "special"
tokens took an event like "start on" and "stop on"? There is some
symmetry to this and it brings big advantages. "daemon" and "fork"
could become events, not special-case tokens. We could then
introduce "file" and "socket" as events. The complication is that we
need to find a way to handle such events out-of-process ideally, to
avoid polluting the Upstart core with specialist knowledge of
sockets, files and D-Bus behaviour. This discussion wasn't finished,
but raises some interesting questions to explore at a later date.
3.2.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: Consider idea further.
3.3 "and" issues (bug 447654)
==============================
As documented in init(5), once a condition becomes true, Upstart
discards knowledge of the other parts of the condition tree (by
unblocking those events). This causes problems when a condition
starts out as false, becomes true, goes false again, but then again
becomes true. This can lead to unexpected behaviour and is confusing
to users. Due to Upstarts internal design, this is a hard problem to
solve and we didn't come up with any workable solutions. More
thought required on this one.
3.3.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt, keybuk]: Consider problem further.
3.4 Environment Variables
==========================
We plan to add EXIT_STATUS and PROCESS environment variables to all
jobs, not just those that fail. This is useful for jobs that specify
the "normal exit" stanza and for any job that wishes to perform
conditional processing based on exit code if there might be a range
of success exit codes.
3.4.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: Implement and write tests.
3.5 Upstart resource site
==========================
We currently have too many Upstart sites (upstart.ubuntu.com,
upstart.at, launchpad.net/upstart). Need to rationalize the
information on each and come up with a plan for a single site.
3.5.1 ACTIONS
~~~~~~~~~~~~~~
* [slangasek]: Investigate a single site.
If you are interested in getting involved in any of the topics covered
in this mail, please:
1) Read [http://upstart.ubuntu.com/wiki/ContributingCode]
2) Make your views known on this list
(to avoid working on a feature already being worked on!)
Thanks!
Regards,
James.
James Hunt
--
--
James Hunt
____________________________________
http://upstart.ubuntu.com/cookbook
http://upstart.ubuntu.com/cookbook/upstart_cookbook.pdf
More information about the upstart-devel
mailing list