Upstart plans

Thu Aug 4 19:01:47 UTC 2011

Hi All,

Last week, Scott, Steve Langasek (hotfoot from DebConf11!) and myself met to discuss future plans
for Upstart and work on some new feature ideas. This mail summarises those discussions.

Table of Contents
=================

1 Day 1
    1.1 General Catchup
    1.2 Upstart in Chromium OS
        1.2.1 Boot Overview
        1.2.2 Runlevels
        1.2.3 Failsafe Mode
        1.2.4 ACTIONS
    1.3 NIH issues
        1.3.1 bug 776532 (nih_dir_walk_scan passes incorrect value to file filter)
        1.3.2 bug 803587 (nih-dbus-tool generates invalid c code for structure types)
2 Day 2
    2.1 Logging of job output (and maybe events) (bug 328881)
        2.1.1 ACTIONS
    2.2 State-passing on re-exec (bug 348455)
        2.2.1 ACTIONS
    2.3 Ptrace limitations (bug 406397, for example)
        2.3.1 Stage 1 - track exits, not forks
        2.3.2 Stage 2 - track all pids
        2.3.3 Stage 3 - introduce cgroups
        2.3.4 ACTIONS
3 Day 3
    3.1 Upstart in Debian
        3.1.1 Version of Upstart
        3.1.2 insserv/startpart needs to have knowledge of Upstart.
        3.1.3 Debian policy needs to be updated to allow for alternative init systems to SystemV.
    3.2 "expect stanza" enhancements.
        3.2.1 ACTIONS
    3.3 "and" issues (bug 447654)
        3.3.1 ACTIONS
    3.4 Environment Variables
        3.4.1 ACTIONS
    3.5 Upstart resource site
        3.5.1 ACTIONS

1 Day 1
--------

1.1 General Catchup
====================

1.2 Upstart in Chromium OS
===========================
    This was an extremely interesting discussion to understand Chromium OS's
    use of Upstart. Chromium OS has very strict boot time-allocations for
    different parts of the system (kernel, X11, etc) so getting an
    insight into such a specialized boot was highly instructive.

1.2.1 Boot Overview
~~~~~~~~~~~~~~~~~~~~
      1) Event "startup" is emitted.
      2) Most critical-path jobs start ("start on startup"):
      2.1) Job "udev" runs.
      2.2) Job "startup" starts.
           The startup job launches critical-path services and handles such
           things as mounting devices.

      3) Job "boot-services" starts ("start on stopped startup").
         This job starts all services used directly or indirectly by the UI
         such as:
           - syslog
           - dbus
           - wifi
           - X11

      All of these services are running before "started boot-services" is
      emitted.

      Once event "started boot-services" has been emitted, the user can login.

      4) Job "system-services" runs ("starts on started boot-services")

      Abstract job that runs for life of the boot. All other services are
      started from this job via "start on starting system-services". Note
      that these services are the lowest priority and are started *after*
      the login screen is displayed. Examples:
      - cron
      - update manager
      - power management
      - crash reporter

1.2.2 Runlevels
~~~~~~~~~~~~~~~~

      An interesting observation is that Chromium OS has dispensed entirely with the
      legacy SystemV runlevels. This is perfectly natural since the services
      run on Chromium OS are strictly controlled and no legacy init system
      support is thus required.

1.2.3 Failsafe Mode
~~~~~~~~~~~~~~~~~~~~

      Additional to this, Chromium OS has a novel "failsafe" method for ensuring
      that certain key services are guaranteed to start even if the login
      screen fails to display for some reasons. Examples of such jobs that are
      started in failsafe mode are VT consoles and the ssh server. The
      failsafe facility is implemented as 2 jobs:

      - failsafe.conf
      - failsafe-delay.conf

      The failsafe-delay job specifies "start on started boot-services" such
      that it starts early on in the boot. This job simply sleeps for 30
      seconds (which is larger than the work-case overall boot time). Once
      failsafe-delay stops, even if the main boot fails, failsafe starts since
      it specifies, "start on starting system-services or stopped
      failsafe-delay".

      Jobs that absolutely must start even if the boot fails for some reason
      can then specify "start on started failsafe" and be assured of starting,
      in the worst case scenario after a 30 second delay.

1.2.4 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: See if Ubuntu could benefit from any of these techniques.

1.3 NIH issues
===============
    [https://bugs.launchpad.net/libnih]
    As more use is being made of the NIH Utility Library outside of core
    Upstart, a few issues have been found recently. We worked through
    two of the most interesting we discussed were:

1.3.1 bug 776532 (nih_dir_walk_scan passes incorrect value to file filter)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      Required for as-yet-unreleased "upstart-file-bridge". Found a
      solution to this issue and documented.
* ACTIONS
  + [jhunt]: implement with tests.

1.3.2 bug 803587 (nih-dbus-tool generates invalid c code for structure types)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      Found through development of feature to disable jobs in Upstart.
      Scott identified the fix for this.
* ACTIONS
  + [jhunt]: implement and write tests.

2 Day 2
--------

2.1 Logging of job output (and maybe events) (bug 328881)
==========================================================

    Pair programming to write minimal logging functionality for jobs. I
    now have the code working (90% complete for system jobs, excluding
    tests) such that we will soon introduce a "console log" option
    (which will be the new default for jobs). Once directory
    "/var/log/upstart/" becomes available, all output from jobs
    specifying explicitly or implicitly "console log" will have their
    stdout and stderr redirected to a file in this directory. Currently,
    the code is buffering output, so probably makes sense to adopt a pty
    approach. A complication is logging of user jobs (for a variety of
    reasons).

    This feature will be a huge aid to debugging and a boon for
    Administrators.

    Could utilize same technique to log events, although need to assess
    performance impact of this due to numbers of events being emitted.

2.1.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: Finish code and write tests.

2.2 State-passing on re-exec (bug 348455)
==========================================

    In Ubuntu, Upstart has the ability to re-exec itself to allow the
    currently running /sbin/init to be replaced cleanly. However, the
    re-exec is currently stateless such that the newly exec'ed image has
    no knowledge of jobs that were running before the exec. A simple
    facility is used for such state-passing between the initramfs
    (non-Upstart) and the main system, but the method employed is not
    ideal.

    We discussed a plan to introduce full state-passing between the old
    and new instances which can be summarized succinctly as:

    - create a pipe.
    - fork.
    - child creates socket and listens on it.
    - child passes details of socket back to parent via pipe
      (or could just use well-known location).
    -  child closes pipe.
    -  parent re-execs itself (closing pipe), passing a cmdline option to notify
      init to read from the socket.
    - child sends meta-data on existing jobs through pipe.
    - parent parses meta-data and initializes data structures based on this info.

      Plan currently is to use JSON for structured representation of
      meta-data.

      Perceived issues:

      - We cannot restore D-Bus connections.
      - New version of init being exec'ed must understand all historical
        JSON syntax quirks if we ever change how we represent objects.
      - Child must send its version to the re-exec'ed parent and if that
        parent detects the child is newer than it, state passing would
        be usafe since this scenario is indicative of downgrading the
        Upstart version. In such instances, the best course of action
        may be to:
        - generate a warning
        - log the childs state to a file
        - re-exec with no state-passing.

2.2.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: implement.

2.3 Ptrace limitations (bug 406397, for example)
=================================================

    Lots of discussion resulted in the basics of a plan for how to
    overcome the ptrace issues.

    ptrace(2) is the primary facility Upstart uses to track pids. This
    provides a clever and elegant solution to a difficult problem, but
    there are now known issues with its use. As such, we discussed
    available alternatives such as cgroups and the proc connector.
    Unfortunately, the latter suffers from too many limitations right
    now (inability to work in a container being an important
    restriction), so we plan to introduce support for cgroups. However,
    the introduction of cgroups will be staged:

2.3.1 Stage 1 - track exits, not forks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      Change Upstart to track processes calling one of the various
      exit() calls.

      The advantage of tracking exits being that that it's a better
      indication of daemonization since the exit() always follows the
      fork(), and it allows for services whose parent actually stays
      running until the child is ready, as well as those that the parent
      dies immediately.

      Stage 1 Tasks:

* Update pid tracking code to use PTRACE_O_TRACEEXIT rather than PTRACE_O_TRACEFORK.
* Introduce "expect exit <n>" syntax.

2.3.2 Stage 2 - track all pids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      Upstart current tracks only 1 pid / job process type (main,
      pre-start, etc). By changing it to track *all* pids for all
      process types, we minimize problems relating to "unexpected" pids.
      It also lays the foundation for the transition to cgroups.

      Stage 2 Tasks:

* Make JobClass->process array a linked list of "Process" objects
  to allow Upstart to track all pids a process type has been known as.
* Add ProcessType entry to Process object since can't use index of
  array any more.
* Add entry for PROCESS_CUSTOM to ProcessType.
* Add "char *" name to Process to store name of process type.
* We can now hanle >1 pids per job process. This fixes a lot of
  problems. Also paves the way for custom actions (bug 94873)
  since we can simply add a new entry to the linked list for a
  new process type "foo" / "monitor", etc.
* Change ptrace from being "on fork" to be on "first pid exiting" (count exits, not forks).

2.3.3 Stage 3 - introduce cgroups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Stage 3 Tasks:
* Add "char *" representing the cgroup alongside the pid_t in linked list of pids
  such that we can track the cgroups.
  Cgroups for Upstart will be something like:

    /upstart/$job/$instance/$process

  job_process_spawn() will then write $pid to:

    /upstart/$job/$instance/$process/tasks

  Examples:

    /upstart/apache/main
    /upstart/network-interface/eth0/pre-start
    /upstart/network-interface/eth0/post-stop

  The "release_agent", which gets called automatically when the
  cgroup is empty (indicating that the processes associated with a
  job have exited) will probably be a symbolic link to a new
  initctl(8) command "release" (itself a link to /sbin/initctl).

  Once cgroups have been introduced to Upstart, job lookup will
  occur by cgroup, rather than by pid. However, it is possible
  that ptrace will still be used in combination with cgroups,
  creating a hybrid system for maximum flexibility.

  To stop an job process, Upstart will read the pids
  from the tasks file and then kill those pids.

2.3.4 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: implement the plan.

3 Day 3
--------

3.1 Upstart in Debian
======================

3.1.1 Version of Upstart
~~~~~~~~~~~~~~~~~~~~~~~~~
      Upstart in Debian is very old (0.6.6-2).
      Needs to be updated and kept in sync with upstream. Ideally, The
      minor Ubuntu differences between it and upstream need to be
      propagated into Debian such that there is no delta between Debian
      and Ubuntu.
      Scott and Steve are able to sponsor James' uploads.

3.1.2 insserv/startpart needs to have knowledge of Upstart.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Needs to emit "starting" and "started" events for each service it runs.
* ACTIONS:
  + [slangasek]: patch startpar to understand the status of upstart jobs.
  + [slangasek]: fix debhelper so upstart jobs and init scripts can be installed side-by-side in
Debian.
  + [slangasek]: get an up-to-date upstart into Debian unstable.

3.1.3 Debian policy needs to be updated to allow for alternative init systems to SystemV.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3.2 "expect stanza" enhancements.
==================================
    We observed that currently the "start on" and "stop on" stanzas take
    an event condition, whereas the "expect" stanza takes one of two
    "special" tokens representing the number of times the process is
    expected to fork(2). Note too that "start on" is handled internally
    by Upstart *and* in some instances also by out-of-process bridges
    (primarily the upstart-socket-bridge).

    We also discussed that forks are merely one way
    for a job to indicate it is "ready" and in fact for database
    servers, the fact that the primary daemon is running is no
    indication that the server is available. It may require significant
    time to replay logs before it is able to start servicing client
    requests. Other possible indications of readiness could be:

    - The creation of a file (log file, lock file, etc).
    - The appearance of a service on the D-Bus system bus.
    - The creation of a listening socket of a particular type on a particular port.
    - Theoretically, a process emitting an event to say "I'm now ready".

    This last one is the most intriguing one in that it triggers an
    interesting thought: what if "expect", rather than taking "special"
    tokens took an event like "start on" and "stop on"? There is some
    symmetry to this and it brings big advantages. "daemon" and "fork"
    could become events, not special-case tokens. We could then
    introduce "file" and "socket" as events. The complication is that we
    need to find a way to handle such events out-of-process ideally, to
    avoid polluting the Upstart core with specialist knowledge of
    sockets, files and D-Bus behaviour. This discussion wasn't finished,
    but raises some interesting questions to explore at a later date.

3.2.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: Consider idea further.

3.3 "and" issues (bug 447654)
==============================
    As documented in init(5), once a condition becomes true, Upstart
    discards knowledge of the other parts of the condition tree (by
    unblocking those events). This causes problems when a condition
    starts out as false, becomes true, goes false again, but then again
    becomes true. This can lead to unexpected behaviour and is confusing
    to users. Due to Upstarts internal design, this is a hard problem to
    solve and we didn't come up with any workable solutions. More
    thought required on this one.

3.3.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt, keybuk]: Consider problem further.

3.4 Environment Variables
==========================
    We plan to add EXIT_STATUS and PROCESS environment variables to all
    jobs, not just those that fail. This is useful for jobs that specify
    the "normal exit" stanza and for any job that wishes to perform
    conditional processing based on exit code if there might be a range
    of success exit codes.

3.4.1 ACTIONS
~~~~~~~~~~~~~~
* [jhunt]: Implement and write tests.

3.5 Upstart resource site
==========================
    We currently have too many Upstart sites (upstart.ubuntu.com,
    upstart.at, launchpad.net/upstart). Need to rationalize the
    information on each and come up with a plan for a single site.

3.5.1 ACTIONS
~~~~~~~~~~~~~~
* [slangasek]: Investigate a single site.

If you are interested in getting involved in any of the topics covered
in this mail, please:

1) Read [http://upstart.ubuntu.com/wiki/ContributingCode]

2) Make your views known on this list
   (to avoid working on a feature already being worked on!)

Thanks!

Regards,

James.

James Hunt

-- 
--
James Hunt
____________________________________
http://upstart.ubuntu.com/cookbook
http://upstart.ubuntu.com/cookbook/upstart_cookbook.pdf