Upstart 0.5 Are We There Yet?

Wed Jan 16 03:14:56 GMT 2008

This is an interim update to the Upstart 0.5 Roadmap sent to this
mailing list three months ago, which you can find the archives here:
https://lists.ubuntu.com/archives/upstart-devel/2007-October/000468.html

Stats
-----

First, some stats on the development: the last released version of
Upstart from trunk was 0.3.8 (0.3.9 backported various changes from
trunk, so we ignore that for these purposes).

0.3.8 had 107 files, totalling 51,565 lines; of which 39 files were
source, totalling 41,126 lines of code (17,198 semi-colons).

Current trunk has 96 files, totalling 51,099 lines; of which 31 files
are source, totalling 38,520 lines of code (17,582 semi-colons).

Despite the loss of libupstart, the code has remained roughly the same
size; this can be explained by a significant increase in the number of
test cases for some areas of the code.

As for the difference, 63 files have been changed with 22,251 lines
added and 23,209 lines removed; so it wouldn't be unreasonable to claim
that well over half of the code has been touched so far.

Even though so much code has changed, I'm still completely confident in
Upstart's stability and operational correctness; thanks to the extreme
number of test cases and very high code coverage of the test suite.
Whilst no complex, changing, software can ever have zero bugs; I'm happy
that the number in Upstart is significantly below the average.

Anyway, on with the update...

libnih
------

Obsolete *_free() functions in favour of single nih_free() function,
placing structure-specific handling in *_destroy() destructor functions.

	This has been completed, and has greatly improved the
	readability of the code; it also eventually removed the need for
	NihWatch and NihIo to have strange "lazy" free handling.

	As I implemented this, I realised that destructors should be
	considered internal to the code, and not used as generic "hooks"
	by other modules, especially the test cases -- instead I added
	new test harness macros to check whether an object is freed or
	not using a child allocation; this has greatly improved those
	tests.

Add *_init() functions for static allocation of structures normally
created with *_new() functions.

	In several cases, we expect to be able to use the structure as
	an nih_alloc() context (NihHash) or place it in a managed list
	(NihTimer, NihSignal, etc.)  in these cases, it's simply not
	possible to have static versions.

	This item is therefore removed from the roadmap, except to note
	that if it's possible, it's desirable -- it's not mandatory.

Configuration
-------------

Separate job configuration from the instance state data, making
instances new copies of the state structure rather than copies of the
entire definition structure.

	Completed and this has definitely made understanding what's
	going on easier, and untwisted lots of code -- though along with
	the configuration source changes, made the tests somewhat
	long-winded.

	Now the only difference between an instance and non-instance job
	is whether job_instance() will return an existing or new
	instance; otherwise non-instance jobs are identical to instance
	ones -- the state structure is still deleted when the job stops.

For each job name, keep a record of the current holder of that name and
which others are available; when the current holder is stopped, elect
the new current holder (which may be the same as the old one).

	This got changed somewhat between planning and implementation;
	the plan was just too complicated and difficult to deal with, in
	fact, the whole Conf system plan was too complicated.

	I've massively simplified it, and this has made things much
	easier.  A configuration source may be a file of special
	directives, a directory of such files, or a directory of job
	definitions (one per file).  The extra level of nesting
	(directories of files containing multiple job definitions) is no
	longer permitted.

	Rather than keep some kind of in-memory index of current and
	available job definitions by name, the separated job code makes
	it very simple to catch the end of a job and replace it at that
	point.  The new destructor code ensures that this is also
	attempted on file deletion and pointers aren't left dangling if
	it's still running.

Allow external processes to create jobs and manage them.

	The internals necessary to permit this are largely complete, and
	simply waiting on a new IPC system to harness it.  The external
	process creates a ConfSource and attaches ConfFile items to it
	for each job configuration that it creates.

	Reloading isn't clear yet, but it'll probably involve recording
	some kind of owner.

Support the ability to reload configuration with a signal or command.

	Completed, though the command is pending a new IPC system;
	sending SIGHUP will cause Upstart to reload its configuration by
	force -- of course, this shouldn't be necessary with inotify,
	but it does test code that'll be needed for restart handover
	(especially from initramfs to real system).

Service Management
------------------

Create a close-on-exec pipe to the child before spawning a new process
so that we can detect whether the exec() call succeeded or an error
occurred.

	This has now been completed, and is basically as planned; this
	means that a useful error is emitted by the init daemon when a
	job cannot be spawned (e.g. due to configuration error) and the
	job is stopped.

	The major benefit is that a failure due to spawn error is
	clearly distinct from a process exiting 255 on its own; and it's
	not possible to configure the job to attempt to respawn on
	these.

Supervision of forking daemons by locating children at SIGCHLD time and
watching for pid files.

	After much trial, error and experimentation; neither of these
	two methods worked as well as would be required.

	Instead I went for something much crazier, Upstart now supports
	supervision of forking daemons by using the ptrace() system call
	on them and following forks; pretty much as gdb and strace do. 

	This can be limited by configuration to one or two forks as
	necessary.  Future expansion could be to heuristically detect
	this, and even back up a process if the one followed should call
	exec() -- but the implementation suffices for now.

[NEW] Service readiness announcement through SIGSTOP.

	When services remain in the foreground, there's no usual
	way for them to signal that they have completed their
	initialisation and are ready to receive client connections.

	The usual method when they go into the background is to use the
	fork() for this, but obviously this isn't available.

	Upstart supports a different method for foreground services,
	they may raise the SIGSTOP signal.  This signals that they are
	ready, Upstart will sent them SIGCONT and adjust its own job
	state to take the job out of SPAWNED and towards RUNNING.

Disable a job from its definition, instead of just deleting it.

	I have again become unconvinced of the usefulness of this,
	instead favouring something more like "profiles" or "flags"
	where jobs can be disabled and enabled en-masse.

	Unless somebody can provide a use-case for having a defined job
	that cannot be started?

Scott
-- 
Have you ever, ever felt like this?
Had strange things happen?  Are you going round the twist?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/upstart-devel/attachments/20080116/312405b0/attachment.pgp