Re; Upstart 0.5 Roadmap

Tue Oct 30 04:55:17 GMT 2007

> The main goal of this milestone is to define the structure and behaviour
> of Upstart for its eventual 1.0 release.  It should be largely feature-
> complete in terms of basic behaviour, allowing further development to
> concentrate on extensions and improvements.

Been researching Upstart since I stumbled into Upstart after my first
Ubuntu install a couple of weeks back.  ( Though I've run Debian,
SuSE, RedHat, Mandrake, Gentoo as well as been sysadmin on Unix boxes.
)

The core Event driven dynamic idea was interesting and SOMETHING has
to be done, and I had a break to think about the problem, and the
issues with Sys V init, and why it seemed over-engineered and complex,
yet fail in practice due to inflexibility, fragility and poor
performance.

Ideas influencing me are SuSE's LSB implementation and innserv init
enhancements, ZeroInstall package manager (Rox), the autopackage idea
(programs should just run rather than be hand tweaked per distro).

Many of my post-install Ubuntu problems on my non typical system (old
multi-interface, multi-disk, desktop workstation LAN server), were
init.d and /etc/<package> related.  For instance I noticed that almost
all the LaunchPad Bugs in "simple" firewall Firestarter appear to be
down to issues in an rc?.d style monolithic script, and the attempts
to be helpful by configuring & starting another unrelated package.  At
same time, firewall can be by passed by a GNOME applet duplicating
system functionality, probably due to lack of coherent portable
configuration standard.  Obviously I soon "disovered" init(8) had been
replaced, and wanted to find out about it and what "Event Driven"
meant in practice.

So I'd like to fix "firestarter", but didn't want to re-engineer the
init scripts, if they're going to be immediately replaced by something
sexier.  Had a break away and found myself mulling the issues with Sys
V init and concepts for a replacement.

> Upstart, the Service Manager
> ----------------------------

> The most important piece of Upstart is being as good a service manager
> as it can be, ....
> I'm planning to separate the definition of a
> job, which is static, from its state data.  Jobs which permit multiple
> instances will simply have state structures, rather than existing as
> a new copy of the job.

Yes!  The key advantage of a supervisory daemon, over Parent/Child is
the 2-way communication, and possibilities for cooperating processes;
and the dynamic rather than static pre-planned possibilities of the
system.  Forking, with just Environment changes and signals + wait
statuses does not match well the current computer environment.

>  The intent is that this will allow jobs to
> have arbitrary instance limits (e.g. one for each tty) rather than just
> one or infinite.  It also rationalises the code somewhat.

Sounds like semaphores!

> Upstart currently doesn't pay much attention to a process once it's been
> forked, other than to wait() for it when it dies.

Sounds very like init(8).

> This will be done though a close-on-exec pipe to the child
> process, on error information will be written to it; thus all the parent
> has to do is poll for reading, and if it receives data, it knows there
> was a problem.

Not sure what you gain, over exit statuses plus logging by child, for
admin intervention.  Old init's / inetd's would for instance re-start
processes (say after a user logged out of a hardwired tty terminal, or
a service daemon terminated; and often intelligently deal with
fork/exec/exit loops noticing over-frequent restarts for a
mis-configured daemon).

> An important missing ability is to be able to disable a job from its
> definition, without having to resort to deleting the file.  This will
> be added, such jobs will still be visible but will report an error if
> attempted to be started.  Transient disabling will also be permitted
> through "dependencies"; these are lists of paths on the disk (such as
> the one being exec'd) that must exist, if they do not, the job reports
> an error on start.

Fundamentally this is all about "State".  Events need context and
policy to make actions meaningful.

> Jobs will also support "resources" as a method of throttling jobs; a
> resource is a string name and a floating point number, jobs define
> how much of a particular resource they use while running and can only
> run while the resource is greater than or equal to that number.  This
> will typically be used for locking, or utilisation problems.  If a job
> is started, but has insufficient resources, it will stay in the
> start/waiting state until the resource goes above the necessary level.

How is a job going to know that?  Surely it's the supervisor's task to
monitor memory, CPU usage, faults and such and decide based on a jobs
past performance and current system load, plus any planned future
tasks  to make sensible sheduling decisions.

Similarly if a job requires some Mutex, then it is waiting, and the
supervisor is signalling (Semaphore) when it is making it runnable.

> Since service management is largely concerned with UNIX processes, the
> environment that they run in remains important.  As well as letting the
> job definition define environment variables and their values, this will
> be extended to allow the definition to specify variables to be taken
> from init's own environment (typically PATH, TERM, etc.) ceasing these
> from being hard-coded.  In addition, it does not seem unreasonable for
> environment to be specified when starting a job.

Environment relies on Parent->Child relationships, yet a task
responding to an event, may actually be wanting to message some
information to an unrelated task, started by Upstart, or a user
process requesting notification of an event.

One of the specs, mentions jobs being able to inspect the invocation
tree, to take action depending on which application caused them to be
run.

Much cleaner, is a system of accessing state, querying the Upstart
daemon, and for it to store this information, making it available to
other processes who are authorised to this namespace.

The task, fulfilling a service can then have capabilities and
attributes, defined cleanly, so that when GeeWhiz DB 2.0 comes along,
replacing DogSlow SQL the scripts don't need to be modified just
because the invoking DBMS was replaced by a new package.

> Continuing this thought, it becomes logical that the environment
> variables for an instance are what makes that instance different from
> others of the same job definition.  This may end up being the method by
> which we define the uniqueness of instances, for example "instance TTY"
> might mean that an instance is only spawned if the $TTY variable is
> different from any others running.

How do you inspect another processes environment?

What if V1 of task TTY screen scribble, handles one instance, but
could instead of terminating, service another request on a different
TTY?

Why have a protocol for the instance and service manager to try and
simulataneously change state of the assumed child and actual child
environment, when actually a job class, task to be completed and  a
re-read of the instances job spec, would be much cleaner and could be
done race free.

> This isn't fully decided yet, but it does seem to me that inventing some
> other mechanism for doing this is folly since the method of passing
> those values would just be environment variables anyway!

Script langauages with bindings could access libc, plus Upstart
comms/protocol  libraries directly, with shell scipts loading
variables via sourcing a utilities output.

Then when a task discovers need to query something, it can; rather
than all possible information having to be supplied statically by
Upstart before the job can commence, at a point this info might not be
known.

Is Upstart not supposed to support Dynamic operations?

> It's useful to pass more than just environment variables to a job when
> starting it, it's also useful to be able to pass file descriptors as
> well.  Some safe and secure mechanism will be found where a job started
> from the command-line can be told what its standard input/output/error
> should be (normally the terminal from which it was called).

Pipes suck, as soon as you need 2-way communication, just use a
protocol on a socket and Upstart will become simpler, because
supporting N instances is not much harder than 1.  The socket
de-multiplexes things for you.

The only way to safely set up std{in,out,err} is for the parent code,
to redirect them after the fork, but before the exec.

> Upstart, the System V init Emulator
> -----------------------------------

Good emulation is great, but what problems for the application
developer / packager does Upstart solve, to make them decide on a
native implementation, rather than 'portable' emulation?

I suspect they want simplicity, portability and reduced maintenance
rather than blazing performance (except on their own workstation of
course *wink* ).

> Upstart, the IPC Server
> -----------------------

> One minor, trivial change that it almost doesn't seem worth mentioning.
> Upstart's own home-brew IPC will be dropped, and instead it will depend
> on D-BUS.

Good, seems like the core D-Bus library is designed to be a low
dependancy, infrastructure library, rather than tied to the Desktop.

By sharing this infrastructure and messaging, there's more chance that
Desktop applications will decide it's simpler to use the System
provided official method, rather than doing cool home brew low level
stuff.

The situation where an applet in GNOME, can start up the network
interface such that the installed and configured firewall is ignored,
is a sign of poor integration.

Working on standard messages and Upstart tasks, should simplify
Desktop programs and libraries and reduce duplication, and errors
(including security breaches) due to poorly defined interfaces.

> Upstart, the Service Activation Manager
> ---------------------------------------

In my think-a-thon away I concluded the notions of :

Services
Requirements (dependancies)
Goals
Configuration
   were as important as Events.

Upstart can be more event driven, if it has Goals to fulfill (a
request to achieve some defined state), which is in a way a
pseudo-event.

Interactive users can make Goals Demands "Do this right now ASAP" or
Preparation eg) submit batch jobs "Try and get round to it for me when
you have a minute".  Having more state information defining nature of
tasks, then allows prioritisation schemes.

For example, desktop system starts up, it's main goal is to provide a
Login, in a speedy manner after a reasonable predictable time, without
user intervention.

Once they start a browser, suddenly any outstanding tasks relating to
networking, DNS, a proxy cache, firewall tunnelling (possibly on other
servers) make those services a priority.  Perhaps those batch
downloads for package upgrades that have been backgrounded should
pause, and then resume when the system is "idle".

> Initially it seems to make sense to discard the notion of "events"
> entirely, since they appear to be handled already by D-BUS Service
> Activation (managed by Upstart) and D-BUS Signals.  Even in this model
> you'd want to be able to start or stop services by D-BUS Signals, which
> D-BUS doesn't currently provide

Configuration, configuration, configuration.

init(8) was table driven, allowing reassigning of terminals, modems
and serial printers etc plus changing the type of device in it's class
eg) TERM variable meant a table driven terminal driver could be
written, rather than some login shell dynamically loading a particular
driver.

> This doesn't quite fill the entire picture either though; there are
> still interesting cases where events can be considered methods instead
> of signals.  Most notably the compatibility or near-compat events like
> startup, runlevel, etc.

That's why I like the concept of Goals.  Sys V init had :

Goal 1 - Single User Mode, minimal system, run level 1 (RL1)
Goal 2 - Probably Distro defined, Multi-User mode without networking services
Goal 3 - Multi-user, network services, TTY style login on console
Goal 4 - End User defined
Goal 5 - Graphical Login on Console, X
Goal S - Shutting down

The problem is that the rc?.d symlinks and ordering, gave enough
flexibility and complication to be confusing, where ordering
dependancies were developer/admin issue.  But not enough to be fine
grained to be really useful.  Then the actual implementation mean
100's of scripts were run, each repetitively re-interpreting the same
code, and opening the same files over and over and over again.  It was
SLOW, and every system I've been on was somewhat broken (usually Knn
wrong, and multiple start/stops of services) !

Most Admins, would likely move down to single user mode, and then
restart by hand a few daemons, probably having turned off new
connections and warned users of imminent service interruption.

I'd like to be able to Label a Goal as say, LAN server and then bundle
DHCP server, DNS, NTP, Web Proxy, SMTP, Samba together as a
meta-task.

Then Goal-LAN-server can be enabled, or disabled with Upstart figuring
out the required changes to arrive at the requested state.

> Even if signals were enough, there would need to be some way to pass the
> data of the signal to the process being started -- since it would be
> running too late to get on the bus and catch the signal before it was
> lost.
>
> Unfortunately many signals don't contain enough information or context
> anyway, HAL is a notable culprit for this.  For example, a job is likely
> to want to have an instance of itself running for each device of a
> certain capability.  Unfortunately HAL's signals only include the
> object path, so it's necessary to perform some communication first to
> convert the DeviceAdded and DeviceRemoved signals into useful events
> that can be matched and used to start/stop services which will want
> to know what they are supposed to be handling.

Configuration, configuration...    Both user tweakable, and the
parameters relating to this instance of the task.  Solving the
impedance mismatch between kernel events and the vagaries of User
land, is adding value.

Events that are queued and related to state records, can have a daemon
be notified (started if necessary) of the event, and for it to be able
to handle that event on that device in a thread.  It becomes feasible
to have daemons that get started in response to activity, start extra
threads when busy, and that then go idle and finally be stopped when
activity has ceased for long enough (because the user has gone for
lunch).

Whilst core software ought not to define policy but be flexible,
defining neater ways to get at data and simpler ways to have sub-tasks
carried out passing job specific information will be the real added
value.

The middleman defines an interface to a general service, which may be
implemented by any 1 of a number of packages, or by methods unthought
of when the task was defined.  At present, a new package may require
changes in many others for it to become, truly usable by a One-Click
end user.

> So we still appear to need the ability to define an abstract "event",
> with the interesting distinguishing feature that rather than watching
> an abstract flow, events are defined in advance and may actually require
> some kind of code to run to find out more information.  This fits in
> with one of the original plans for Upstart, where you would have
> processes that performed particular jobs such as listening for signals
> from HAL and converting them into useful events for jobs.

You mention the early 'musings' and having read through them, I found
it hard to extract the key core underlying concepts, which make for a
successful essential upgrade to a system.  But this conclusion is
similar to mine.

Calling an "Abstract Event" a Goal, allows a user to demand Upstart to
do something.  Then rather than hard coding, fork a shell, run bootrc,
generate logins on each TTY etc, these become tasks as demonstrated by
the Sys V emulation.

One of the values, is that it allows for Lazy Evaluation, Upstart can
try to defer all tasks, that are not required by User demands.  Put
the workstation user first, rather than statically define jobs that
have to be run post-boot for system housekeeping or completed before
they can login.

The real value, will be in defining tasks cleanly, so that Upstart
becomes compelling for Developers and Admins, as well as simplifying
the task of implementing an efficient boot up system.

> This is what makes Upstart more than just a dumb service manager, by
> taking some effort to automatically start and stop services as required,
> it can keep the number running to the minimum needed -- thus conserving
> resources and improving performance of even the most hefty workstation.

A clean way to access configuration, and allow Admins tweaking, for
each service class would add value.  Integrated desktops have unified
their configuration data, in order to make global policy changes.

For example, why should run time scripts have to parse config files,
provided to tweak other packages, rather than lookup up the values
maintained in a standard place?

Why do they have to test executable names, which means a subsequent
upgrade to another package providing enhanced features, may break an
unrelated package  that then needs an update to support it?

Why should a package have to know of all the possible implementations
of a service, in order to scan configuration files, or load 'hooks'
into package specific directories?

A system that supports "Need to Know" and concentrates developers and
integrators on the capabilities and attributes of implementations,
rather than the name, who wrote it and it's "coolness" factor, will be
much more maintainable and scale in the long run.