Upstart 0.5 Are We There Yet?
Scott James Remnant
scott at netsplit.com
Wed Jan 16 03:14:56 GMT 2008
This is an interim update to the Upstart 0.5 Roadmap sent to this
mailing list three months ago, which you can find the archives here:
First, some stats on the development: the last released version of
Upstart from trunk was 0.3.8 (0.3.9 backported various changes from
trunk, so we ignore that for these purposes).
0.3.8 had 107 files, totalling 51,565 lines; of which 39 files were
source, totalling 41,126 lines of code (17,198 semi-colons).
Current trunk has 96 files, totalling 51,099 lines; of which 31 files
are source, totalling 38,520 lines of code (17,582 semi-colons).
Despite the loss of libupstart, the code has remained roughly the same
size; this can be explained by a significant increase in the number of
test cases for some areas of the code.
As for the difference, 63 files have been changed with 22,251 lines
added and 23,209 lines removed; so it wouldn't be unreasonable to claim
that well over half of the code has been touched so far.
Even though so much code has changed, I'm still completely confident in
Upstart's stability and operational correctness; thanks to the extreme
number of test cases and very high code coverage of the test suite.
Whilst no complex, changing, software can ever have zero bugs; I'm happy
that the number in Upstart is significantly below the average.
Anyway, on with the update...
Obsolete *_free() functions in favour of single nih_free() function,
placing structure-specific handling in *_destroy() destructor functions.
This has been completed, and has greatly improved the
readability of the code; it also eventually removed the need for
NihWatch and NihIo to have strange "lazy" free handling.
As I implemented this, I realised that destructors should be
considered internal to the code, and not used as generic "hooks"
by other modules, especially the test cases -- instead I added
new test harness macros to check whether an object is freed or
not using a child allocation; this has greatly improved those
Add *_init() functions for static allocation of structures normally
created with *_new() functions.
In several cases, we expect to be able to use the structure as
an nih_alloc() context (NihHash) or place it in a managed list
(NihTimer, NihSignal, etc.) in these cases, it's simply not
possible to have static versions.
This item is therefore removed from the roadmap, except to note
that if it's possible, it's desirable -- it's not mandatory.
Separate job configuration from the instance state data, making
instances new copies of the state structure rather than copies of the
entire definition structure.
Completed and this has definitely made understanding what's
going on easier, and untwisted lots of code -- though along with
the configuration source changes, made the tests somewhat
Now the only difference between an instance and non-instance job
is whether job_instance() will return an existing or new
instance; otherwise non-instance jobs are identical to instance
ones -- the state structure is still deleted when the job stops.
For each job name, keep a record of the current holder of that name and
which others are available; when the current holder is stopped, elect
the new current holder (which may be the same as the old one).
This got changed somewhat between planning and implementation;
the plan was just too complicated and difficult to deal with, in
fact, the whole Conf system plan was too complicated.
I've massively simplified it, and this has made things much
easier. A configuration source may be a file of special
directives, a directory of such files, or a directory of job
definitions (one per file). The extra level of nesting
(directories of files containing multiple job definitions) is no
Rather than keep some kind of in-memory index of current and
available job definitions by name, the separated job code makes
it very simple to catch the end of a job and replace it at that
point. The new destructor code ensures that this is also
attempted on file deletion and pointers aren't left dangling if
it's still running.
Allow external processes to create jobs and manage them.
The internals necessary to permit this are largely complete, and
simply waiting on a new IPC system to harness it. The external
process creates a ConfSource and attaches ConfFile items to it
for each job configuration that it creates.
Reloading isn't clear yet, but it'll probably involve recording
some kind of owner.
Support the ability to reload configuration with a signal or command.
Completed, though the command is pending a new IPC system;
sending SIGHUP will cause Upstart to reload its configuration by
force -- of course, this shouldn't be necessary with inotify,
but it does test code that'll be needed for restart handover
(especially from initramfs to real system).
Create a close-on-exec pipe to the child before spawning a new process
so that we can detect whether the exec() call succeeded or an error
This has now been completed, and is basically as planned; this
means that a useful error is emitted by the init daemon when a
job cannot be spawned (e.g. due to configuration error) and the
job is stopped.
The major benefit is that a failure due to spawn error is
clearly distinct from a process exiting 255 on its own; and it's
not possible to configure the job to attempt to respawn on
Supervision of forking daemons by locating children at SIGCHLD time and
watching for pid files.
After much trial, error and experimentation; neither of these
two methods worked as well as would be required.
Instead I went for something much crazier, Upstart now supports
supervision of forking daemons by using the ptrace() system call
on them and following forks; pretty much as gdb and strace do.
This can be limited by configuration to one or two forks as
necessary. Future expansion could be to heuristically detect
this, and even back up a process if the one followed should call
exec() -- but the implementation suffices for now.
[NEW] Service readiness announcement through SIGSTOP.
When services remain in the foreground, there's no usual
way for them to signal that they have completed their
initialisation and are ready to receive client connections.
The usual method when they go into the background is to use the
fork() for this, but obviously this isn't available.
Upstart supports a different method for foreground services,
they may raise the SIGSTOP signal. This signals that they are
ready, Upstart will sent them SIGCONT and adjust its own job
state to take the job out of SPAWNED and towards RUNNING.
Disable a job from its definition, instead of just deleting it.
I have again become unconvinced of the usefulness of this,
instead favouring something more like "profiles" or "flags"
where jobs can be disabled and enabled en-masse.
Unless somebody can provide a use-case for having a defined job
that cannot be started?
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/upstart-devel/attachments/20080116/312405b0/attachment.pgp
More information about the upstart-devel