[Bug 687535] Re: upstart loses track of ssh daemon after reload ssh

Fri Jan 7 10:20:54 UTC 2011

** Branch linked: lp:~cjwatson/ubuntu/lucid/openssh/lucid-proposed

** Branch linked: lp:~cjwatson/ubuntu/maverick/openssh/maverick-proposed

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is a direct subscriber.
https://bugs.launchpad.net/bugs/687535

Title:
  upstart loses track of ssh daemon after reload ssh

Status in Upstart:
  New
Status in “openssh” package in Ubuntu:
  Fix Released
Status in “openssh” source package in Lucid:
  In Progress
Status in “openssh” source package in Maverick:
  In Progress

Bug description:
  When sshd gets a signal 1 for reload, it forks a new process and ditches the old. This causes upstart to believe that ssh has crashed, and loses track of it. A second reload (or any other initctl operation on ssh) will thus say:

reload: Unknown instance:

There would be 2 ways to fix this:
1.  Don't have ssh fork on relod, but keep the same pid
2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid file? Just tracking children of the exited ssh won't work, or it might accidentally track a particular session rather than the master, if somebody just happens to log in close to reload time.

openssh-server  1:5.3p1-3ubuntu4
upstart         0.6.5-7

==== Info for Maverick, Lucid SRU ====
IMPACT: if sshd gets a HUP signal, it forks a new process and upstart thinks the process died and loses track of it, so the user/admin uses the ability to stop/start/reload the daemon through upstart.
The problem is fixed in Natty 5.6p1-2ubuntu3. See attached patches for Maverick and Lucid.

TEST CASE:

- install openssh-server
- send a HUP signal to sshd
- the daemon is restarted, but upstart thinks that it crashed (/var/log/daemon.log):

Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
Dec 28 20:59:57 utest-lls32 init: ssh main process (1451) terminated with status 255
Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
Dec 28 20:59:57 utest-lls32 init: ssh main process (1455) terminated with status 255
Dec 28 20:59:57 utest-lls32 init: ssh respawning too fast, stopped

- after this, upstart won't know about sshd, despite the daemon running just fine:

root at utest-lls32:~# reload ssh
reload: Unknown instance:

With the fix applied, the correct behavior is:

- send a HUP signal to sshd
  ps ax |grep sshd
  kill -HUP sshd
- the daemon reloads (/var/log/auth.log):

Dec 28 21:37:01 utest-lls32 sshd[742]: Received SIGHUP; restarting.
Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on 0.0.0.0 port 22.
Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on :: port 22.

- reloading with upstart gives the same result, and NOT an error message.

REGRESSION POTENTIAL:

There is a small race condition in sshd between when it forks, and when it listens for incoming connections. The length of the race is lengthened by a very tiny amount by considering sshd started as soon as it has been executed, rather than when it forks. This will only affect jobs that use 'start on started ssh' and immediately connect to it. This is unlikely to cause problems in any real world scenario, given that most of these programs would also have to fork, exec, and open a socket, which is more work than what sshd will be doing in that time.