[Bug 1788643] Re: zombies pile up, system becomes unresponsive

Launchpad Bug Tracker 1788643 at bugs.launchpad.net
Thu Nov 29 18:33:43 UTC 2018


Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: systemd (Ubuntu)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1788643

Title:
  zombies pile up, system becomes unresponsive

Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  Description:    Ubuntu 16.04.5 LTS
  Release:        16.04

  systemd:
    Installed: 229-4ubuntu21.4
    Candidate: 229-4ubuntu21.4
    Version table:
   *** 229-4ubuntu21.4 500
          500 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
          100 /var/lib/dpkg/status
       229-4ubuntu21.1 500
          500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
       229-4ubuntu4 500
          500 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

  This problem is in Azure. We are seeing these problems on different
  systems. Worker nodes (Ubuntu 16.04) in a hadoop cluster start piling
  up zombies and become unresponsive. The syslog and the kernel logs
  don't provide much information.

  The only error we could correlate with what we are seeing was in the
  audit logs. See at the end of this message, the "Connection timed out"
  and the "Cannot create session: Already running in a session"
  messages.

  Our first suspect was memory pressure on the machines. We added
  logging and settings to reboot on out of memory, but all these turned
  to be red herrings.

  Aug 18 19:11:08 wn2-d3ncsp su[112600]: Successful su for root by root
  Aug 18 19:11:08 wn2-d3ncsp su[112600]: + ??? root:root
  Aug 18 19:11:08 wn2-d3ncsp su[112600]: pam_unix(su:session): session opened for user root by (uid=0)
  Aug 18 19:11:08 wn2-d3ncsp systemd-logind[1486]: New session c8 of user root.
  Aug 18 19:11:26 wn2-d3ncsp sshd[112690]: Did not receive identification string from 10.84.93.35
  Aug 18 19:11:34 wn2-d3ncsp su[112600]: pam_systemd(su:session): Failed to create session: Connection timed out
  Aug 18 19:11:34 wn2-d3ncsp su[112600]: pam_unix(su:session): session closed for user root
  Aug 18 19:11:34 wn2-d3ncsp systemd-logind[1486]: Removed session c8.

   
  Aug 18 19:12:03 wn2-d3ncsp sudo: ehiadmin : TTY=pts/1 ; PWD=/home/ehiadmin ; USER=root ; COMMAND=/bin/su -
  Aug 18 19:12:03 wn2-d3ncsp sudo: pam_unix(sudo:session): session opened for user root by ehiadmin(uid=0)
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: Successful su for root by root
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: + /dev/pts/1 root:root
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: pam_unix(su:session): session opened for user root by ehiadmin(uid=0)
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: pam_systemd(su:session): Cannot create session: Already running in a session
  Aug 18 19:12:42 wn2-d3ncsp sshd[113274]: Did not receive identification string from 10.84.93.42
  Aug 18 19:13:37 wn2-d3ncsp su[113085]: pam_unix(su:session): session closed for user root
  Aug 18 19:13:37 wn2-d3ncsp sudo: pam_unix(sudo:session): session closed for user root
  Aug 18 19:13:37 wn2-d3ncsp sshd[112285]: pam_unix(sshd:session): session closed for user ehiadmin
  Aug 18 19:13:37 wn2-d3ncsp systemd-logind[1486]: Removed session 1291.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1788643/+subscriptions



More information about the foundations-bugs mailing list