[Bug 1465386] Re: Default values for WAIT_STATE are wrong in the upstart wait-for-state job

Rarylson Freitas rarylson at gmail.com
Tue Nov 29 18:46:31 UTC 2016


Hi,

I'm sorry for don't response these questions in the last months.

However, in the last days, I had some time and I understood better the
situation.

I'll split my post in three parts.

1 - Technical background

1.1 - Lack of documentation

The `wait-for-start` job is not documented. Because of this, it's more
difficult to understand what should be its expected behavior.

See: https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/962047

1.2 - Upstart job states

According do the Upstart Cookbook:

> States are exposed to users via the status field in the output of the
initctl status command

See: http://upstart.ubuntu.com/cookbook/#job-states

`waiting` and `running` are the two most common states of a job (they
are the initial and final states after running the `initctl start`
command). There are other ones (intermediate states), like the `running`
and `starting` states.

Notes: `started` and `stopped` are not job states (they are events, but
not states). So, running `initctl status` will never print the `started`
or `stopped` strings.

1.3 - `wait-for-state.conf` uses `started`/`stopped` as the default
expected states

I don't know if it's a buggy behavior or if it's intentional. In the
code of the `wait-for-state.conf` file, we have:

```
env WAIT_STATE="started"
env TARGET_GOAL="start"

[...]

    if [ "$WAIT_STATE" = "stopped" ] ; then
        TARGET_GOAL="stop"
    fi
```

And `WAIT_STATE` is used in this line:

```
    # Already running/stopped?
    status $WAIT_FOR | grep -q "$TARGET_GOAL/$WAIT_STATE" && exit 0
```

1.4 - Test case - Apport

So, if we use (for example) `WAIT_FOR=apport` and keep the default
`TARGET_GOAL`/`WAIT_STATE` values, wait-for-state will not return true
when apport is already started.

If apport is stopped:

```
$ stop apport
apport stop/waiting
$ time start wait-for-state WAIT_FOR=apport WAITER=apport 
wait-for-state stop/waiting

real    0m0.035s
user    0m0.000s
sys    0m0.006s
$ echo $?
0
$ status apport
apport start/running
$ time start wait-for-state WAIT_FOR=apport WAITER=apport 
start: Job failed to start

real    0m30.019s
user    0m0.000s
sys    0m0.004s
$ echo $?
1
```

This is because wait-for-start grep's for "start/started" instead of
"start/running".

However, if we explicitly pass WAIT_STATE=running, it will work!!!

```
$ status apport
apport start/running
$ time start wait-for-state WAIT_FOR=apport WAITER=apport  WAIT_STATE=running
wait-for-state stop/waiting

real	0m0.018s
user	0m0.000s
sys	0m0.004s
$ echo $?
0
```

This is because I think the default values should be `running`/`waiting`
instead of `started`/`stopped`. So the code would be idempotent (it will
return 0, even if the job was already started).

2 - Nova compute, GlusterFS and plymouth-shutdown cases

The `nova-compute.conf` job (`nova-compute` package from `deb http
://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/kilo main`)
has the following lines in their `pre-start script` stanza:

```
  # If libvirt-bin is installed, always wait for it to start first
  if status libvirt-bin; then
    start wait-for-state WAIT_FOR=libvirt-bin WAIT_STATE=running WAITER=nova-compute
  fi
```

Because they explicitly pass `WAIT_STATE=running`, this will always work
(even if libvirt-bin is already started).

In the `mounting-glusterfs.conf` job (`glusterfs-client` package from
Ubuntu), they don't pass `WAIT_STATE=running`. They run:

```
start wait-for-state WAIT_FOR=static-network-up WAITER=mounting-glusterfs-$MOUNTPOINT
```

And this is buggy.

Our final case will be the `plymouth-shutdown.conf` job.

They run:

```
start wait-for-state WAITER=plymouth-shutdown WAIT_FOR=lightdm TARGET_GOAL=stop WAIT_STATE=waiting
```

It is, the `WAIT_STATE=waiting` var is explicitly set. Because of this,
if we run the above code, it will always work, even if the lightdm
daemon is already stopped.

Excepting the glusterfs job, all job that I know explicitly pass the
`WAIT_STATE` param (`running` or `waiting`) and do not use the default
provided by the wait-for-state.conf job (`started`/`stopped`).

3 - Risks of updating wait-for-state

Some Upstart scripts use wait-for-state. In my case, I know `nova-
compute.conf`, `mounting-glusterf.conf`, `plymouth-shutdown.conf` and
`rc.conf` (the most risky case).

I use the patched version (with `running`/`waiting` values as default)
in several production servers and, until nowdays, everything is fine.

4 - My opinion

Ubuntu 16.04 uses systemd. Upstart will be discontinued (I liked it so
much, but migrating to systemd was a good/inteligent decision). So, I
think the current behavior of the wait-for-state script could be kept.

The important thing here is to document the bug (unexpected behavior,
not idempotent behavior). So other users can find useful information if
the current behavior is confusing them.

- Document the bug/behavior (we're doing this here)
- Keep the current source code (it is not worth assuming the risk of a update).

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1465386

Title:
  Default values for WAIT_STATE are wrong in the upstart wait-for-state
  job

Status in upstart package in Ubuntu:
  Invalid
Status in upstart source package in Trusty:
  Won't Fix

Bug description:
  In Ubuntu 14.04, the wait-for-state job uses the env vars
  WAIT_STATE="started" or WAIT_STATE="stopped". if the GOAL env var is
  set to 'start' or 'stop'.

  However, according to the upstart cookbook, the desired states for a
  already started/stopped job are 'start/running' and 'stop/waiting'.
  Maybe a misunderstood has occurred was the meaning of signals was job
  states (for example, after a job reaches the start/running state, it
  emits the started signal).

  The strange thing is that the waiting/running wait states were
  proposed at the first implementations of this job, and not the
  stopped/started states: https://lists.ubuntu.com/archives/upstart-
  devel/2011-February/001405.html.

  This  bug can make some developers to write upstart jobs that return
  errors (like the GlusterFS developers, that forgot the
  WAIT_STATE="running" condition).

  References:

  - http://upstart.ubuntu.com/cookbook/#job-states
  - http://upstart.ubuntu.com/cookbook/#events-not-states
  - https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1465382

  I'm attaching a patch too.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1465386/+subscriptions



More information about the Ubuntu-sponsors mailing list