[Bug 1610499] [NEW] hadoop crash: /bin/kill in ubuntu16.04 has bug in killing process group

Launchpad Bug Tracker 1610499 at bugs.launchpad.net
Fri Oct 7 11:16:13 UTC 2016


You have been subscribed to a public bug:

when i run hadoop in ubuntu 16.04, ssh will exit, all process which
belong to hadoop user will be killed ,through debug ,i found the
/bin/kill in ubuntu16.04 has a bug , it has bug in killing process group
.

Ubuntu version is:

Description:    Ubuntu 16.04.1 LTS
Release:        16.04

(1)The way to repeat this bug
It is easy to repeat this bug , run “/bin/kill -15 -12345”  or any like “/bin/kill -15 -1xxxx”  in ubuntu16.04  , it will kill all the process .

(2)Cause analysis
The code of /bin/kill in ubuntu16.04 come from procps-3.3.10 ,  when I run “/bin/kill -15 -1xxxx” , it actually send signal 15 to -1 ,

-1 mean it will kill all the process .

(3)The bug in procps-3.3.10/skill.c ,I think the code "pid = (long)('0'
- optopt) " is not right .

static void __attribute__ ((__noreturn__))     kill_main(int argc, char **argv)
{
          case '?':
                        if (!isdigit(optopt)) {
                                xwarnx(_("invalid argument %c"), optopt);
                                kill_usage(stderr);
                        } else {
                            /* Special case for signal digit negative
                             * PIDs */
                        pid = (long)('0' - optopt);

                        if (kill((pid_t)pid, signo) != 0)
                             exitvalue = EXIT_FAILURE;
                            exit(exitvalue);
                        }
                        loop=0;
}

(4) the cause
 sometimes when the resource is tight or a hadoop container lost connection in sometime, the nodemanager will kill this container , it send a signal to kill this jvm process ,it is a normal behavior for hadoop to kill a task and then reexecute this task. but with this kill bug ,it kill all the process belong to a hadoop user .

(5) The way to workaround
 I  copy /bin/kill in ubuntu14.04 to override /bin/kill in ubuntu16.04, it is ok in this way . I also think it is better to ask procps-3.3.10 maintainer to solve their bug,but i don't know how to contact them .

** Affects: procps (Ubuntu)
     Importance: Undecided
         Status: Confirmed

-- 
hadoop crash: /bin/kill in ubuntu16.04 has bug in killing process group
https://bugs.launchpad.net/bugs/1610499
You received this bug notification because you are a member of Ubuntu Foundations Bugs, which is subscribed to procps in Ubuntu.



More information about the foundations-bugs mailing list