[Bug 1465386] Re: Default values for WAIT_STATE are wrong in the upstart wait-for-state job
Rarylson Freitas
rarylson at gmail.com
Tue Nov 29 18:46:31 UTC 2016
Hi,
I'm sorry for don't response these questions in the last months.
However, in the last days, I had some time and I understood better the
situation.
I'll split my post in three parts.
1 - Technical background
1.1 - Lack of documentation
The `wait-for-start` job is not documented. Because of this, it's more
difficult to understand what should be its expected behavior.
See: https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/962047
1.2 - Upstart job states
According do the Upstart Cookbook:
> States are exposed to users via the status field in the output of the
initctl status command
See: http://upstart.ubuntu.com/cookbook/#job-states
`waiting` and `running` are the two most common states of a job (they
are the initial and final states after running the `initctl start`
command). There are other ones (intermediate states), like the `running`
and `starting` states.
Notes: `started` and `stopped` are not job states (they are events, but
not states). So, running `initctl status` will never print the `started`
or `stopped` strings.
1.3 - `wait-for-state.conf` uses `started`/`stopped` as the default
expected states
I don't know if it's a buggy behavior or if it's intentional. In the
code of the `wait-for-state.conf` file, we have:
```
env WAIT_STATE="started"
env TARGET_GOAL="start"
[...]
if [ "$WAIT_STATE" = "stopped" ] ; then
TARGET_GOAL="stop"
fi
```
And `WAIT_STATE` is used in this line:
```
# Already running/stopped?
status $WAIT_FOR | grep -q "$TARGET_GOAL/$WAIT_STATE" && exit 0
```
1.4 - Test case - Apport
So, if we use (for example) `WAIT_FOR=apport` and keep the default
`TARGET_GOAL`/`WAIT_STATE` values, wait-for-state will not return true
when apport is already started.
If apport is stopped:
```
$ stop apport
apport stop/waiting
$ time start wait-for-state WAIT_FOR=apport WAITER=apport
wait-for-state stop/waiting
real 0m0.035s
user 0m0.000s
sys 0m0.006s
$ echo $?
0
$ status apport
apport start/running
$ time start wait-for-state WAIT_FOR=apport WAITER=apport
start: Job failed to start
real 0m30.019s
user 0m0.000s
sys 0m0.004s
$ echo $?
1
```
This is because wait-for-start grep's for "start/started" instead of
"start/running".
However, if we explicitly pass WAIT_STATE=running, it will work!!!
```
$ status apport
apport start/running
$ time start wait-for-state WAIT_FOR=apport WAITER=apport WAIT_STATE=running
wait-for-state stop/waiting
real 0m0.018s
user 0m0.000s
sys 0m0.004s
$ echo $?
0
```
This is because I think the default values should be `running`/`waiting`
instead of `started`/`stopped`. So the code would be idempotent (it will
return 0, even if the job was already started).
2 - Nova compute, GlusterFS and plymouth-shutdown cases
The `nova-compute.conf` job (`nova-compute` package from `deb http
://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/kilo main`)
has the following lines in their `pre-start script` stanza:
```
# If libvirt-bin is installed, always wait for it to start first
if status libvirt-bin; then
start wait-for-state WAIT_FOR=libvirt-bin WAIT_STATE=running WAITER=nova-compute
fi
```
Because they explicitly pass `WAIT_STATE=running`, this will always work
(even if libvirt-bin is already started).
In the `mounting-glusterfs.conf` job (`glusterfs-client` package from
Ubuntu), they don't pass `WAIT_STATE=running`. They run:
```
start wait-for-state WAIT_FOR=static-network-up WAITER=mounting-glusterfs-$MOUNTPOINT
```
And this is buggy.
Our final case will be the `plymouth-shutdown.conf` job.
They run:
```
start wait-for-state WAITER=plymouth-shutdown WAIT_FOR=lightdm TARGET_GOAL=stop WAIT_STATE=waiting
```
It is, the `WAIT_STATE=waiting` var is explicitly set. Because of this,
if we run the above code, it will always work, even if the lightdm
daemon is already stopped.
Excepting the glusterfs job, all job that I know explicitly pass the
`WAIT_STATE` param (`running` or `waiting`) and do not use the default
provided by the wait-for-state.conf job (`started`/`stopped`).
3 - Risks of updating wait-for-state
Some Upstart scripts use wait-for-state. In my case, I know `nova-
compute.conf`, `mounting-glusterf.conf`, `plymouth-shutdown.conf` and
`rc.conf` (the most risky case).
I use the patched version (with `running`/`waiting` values as default)
in several production servers and, until nowdays, everything is fine.
4 - My opinion
Ubuntu 16.04 uses systemd. Upstart will be discontinued (I liked it so
much, but migrating to systemd was a good/inteligent decision). So, I
think the current behavior of the wait-for-state script could be kept.
The important thing here is to document the bug (unexpected behavior,
not idempotent behavior). So other users can find useful information if
the current behavior is confusing them.
- Document the bug/behavior (we're doing this here)
- Keep the current source code (it is not worth assuming the risk of a update).
--
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1465386
Title:
Default values for WAIT_STATE are wrong in the upstart wait-for-state
job
Status in upstart package in Ubuntu:
Invalid
Status in upstart source package in Trusty:
Won't Fix
Bug description:
In Ubuntu 14.04, the wait-for-state job uses the env vars
WAIT_STATE="started" or WAIT_STATE="stopped". if the GOAL env var is
set to 'start' or 'stop'.
However, according to the upstart cookbook, the desired states for a
already started/stopped job are 'start/running' and 'stop/waiting'.
Maybe a misunderstood has occurred was the meaning of signals was job
states (for example, after a job reaches the start/running state, it
emits the started signal).
The strange thing is that the waiting/running wait states were
proposed at the first implementations of this job, and not the
stopped/started states: https://lists.ubuntu.com/archives/upstart-
devel/2011-February/001405.html.
This bug can make some developers to write upstart jobs that return
errors (like the GlusterFS developers, that forgot the
WAIT_STATE="running" condition).
References:
- http://upstart.ubuntu.com/cookbook/#job-states
- http://upstart.ubuntu.com/cookbook/#events-not-states
- https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1465382
I'm attaching a patch too.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1465386/+subscriptions
More information about the Ubuntu-sponsors
mailing list