[Bug 1788643] Re: zombies pile up, system becomes unresponsive
Launchpad Bug Tracker
1788643 at bugs.launchpad.net
Thu Nov 29 18:33:43 UTC 2018
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: systemd (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1788643
Title:
zombies pile up, system becomes unresponsive
Status in systemd package in Ubuntu:
Confirmed
Bug description:
Description: Ubuntu 16.04.5 LTS
Release: 16.04
systemd:
Installed: 229-4ubuntu21.4
Candidate: 229-4ubuntu21.4
Version table:
*** 229-4ubuntu21.4 500
500 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
229-4ubuntu21.1 500
500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
229-4ubuntu4 500
500 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
This problem is in Azure. We are seeing these problems on different
systems. Worker nodes (Ubuntu 16.04) in a hadoop cluster start piling
up zombies and become unresponsive. The syslog and the kernel logs
don't provide much information.
The only error we could correlate with what we are seeing was in the
audit logs. See at the end of this message, the "Connection timed out"
and the "Cannot create session: Already running in a session"
messages.
Our first suspect was memory pressure on the machines. We added
logging and settings to reboot on out of memory, but all these turned
to be red herrings.
Aug 18 19:11:08 wn2-d3ncsp su[112600]: Successful su for root by root
Aug 18 19:11:08 wn2-d3ncsp su[112600]: + ??? root:root
Aug 18 19:11:08 wn2-d3ncsp su[112600]: pam_unix(su:session): session opened for user root by (uid=0)
Aug 18 19:11:08 wn2-d3ncsp systemd-logind[1486]: New session c8 of user root.
Aug 18 19:11:26 wn2-d3ncsp sshd[112690]: Did not receive identification string from 10.84.93.35
Aug 18 19:11:34 wn2-d3ncsp su[112600]: pam_systemd(su:session): Failed to create session: Connection timed out
Aug 18 19:11:34 wn2-d3ncsp su[112600]: pam_unix(su:session): session closed for user root
Aug 18 19:11:34 wn2-d3ncsp systemd-logind[1486]: Removed session c8.
Aug 18 19:12:03 wn2-d3ncsp sudo: ehiadmin : TTY=pts/1 ; PWD=/home/ehiadmin ; USER=root ; COMMAND=/bin/su -
Aug 18 19:12:03 wn2-d3ncsp sudo: pam_unix(sudo:session): session opened for user root by ehiadmin(uid=0)
Aug 18 19:12:03 wn2-d3ncsp su[113085]: Successful su for root by root
Aug 18 19:12:03 wn2-d3ncsp su[113085]: + /dev/pts/1 root:root
Aug 18 19:12:03 wn2-d3ncsp su[113085]: pam_unix(su:session): session opened for user root by ehiadmin(uid=0)
Aug 18 19:12:03 wn2-d3ncsp su[113085]: pam_systemd(su:session): Cannot create session: Already running in a session
Aug 18 19:12:42 wn2-d3ncsp sshd[113274]: Did not receive identification string from 10.84.93.42
Aug 18 19:13:37 wn2-d3ncsp su[113085]: pam_unix(su:session): session closed for user root
Aug 18 19:13:37 wn2-d3ncsp sudo: pam_unix(sudo:session): session closed for user root
Aug 18 19:13:37 wn2-d3ncsp sshd[112285]: pam_unix(sshd:session): session closed for user ehiadmin
Aug 18 19:13:37 wn2-d3ncsp systemd-logind[1486]: Removed session 1291.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1788643/+subscriptions
More information about the foundations-bugs
mailing list