[Bug 1368737] [NEW] Pacemaker can seg fault on crm node online/standy

Launchpad Bug Tracker 1368737 at bugs.launchpad.net
Fri Oct 31 03:13:19 UTC 2014


You have been subscribed to a public bug by Rafael David Tinoco (inaddy):

It was brought to my attention the following situation:

"""
[Issue] 

lrmd process crashed when repeating "crm node standby" and "crm node
online"

---------------- 
# grep pacemakerd ha-log.k1pm101 | grep core 
Aug 27 17:47:06 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 49275 (lrmd) dumped core 
Aug 27 17:47:06 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=49275, core=1) 
Aug 27 18:27:14 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 1471 (lrmd) dumped core 
Aug 27 18:27:14 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=1471, core=1) 
Aug 27 18:56:41 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 35771 (lrmd) dumped core 
Aug 27 18:56:41 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=35771, core=1) 
Aug 27 19:44:09 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 60709 (lrmd) dumped core 
Aug 27 19:44:09 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=60709, core=1) 
Aug 27 20:00:53 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 35838 (lrmd) dumped core 
Aug 27 20:00:53 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=35838, core=1) 
Aug 27 21:33:52 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 49249 (lrmd) dumped core 
Aug 27 21:33:52 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=49249, core=1) 
Aug 27 22:01:16 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 65358 (lrmd) dumped core 
Aug 27 22:01:16 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=65358, core=1) 
Aug 27 22:28:02 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed process 22693 (lrmd) dumped core 
Aug 27 22:28:02 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=22693, core=1) 
---------------- 

---------------- 
# grep pacemakerd ha-log.k1pm102 | grep core 
Aug 27 15:32:48 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed process 5812 (lrmd) dumped core 
Aug 27 15:32:48 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=5812, core=1) 
Aug 27 15:52:52 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed process 35781 (lrmd) dumped core 
Aug 27 15:52:52 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=35781, core=1) 
Aug 27 16:02:54 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed process 51984 (lrmd) dumped core 
Aug 27 16:02:54 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=51984, core=1) 
"""

Analyzing core file with dbgsyms I could see that:

#0  0x00007f7184a45983 in services_action_sync (op=0x7f7185b605d0) at services.c:434
434	        crm_trace(" >  stdout: %s", op->stdout_data);

Is responsible for the core.

I've checked upstream code and there might be 2 important commits that
could be cherry-picked to fix this behavior:

commit f2a637cc553cb7aec59bdcf05c5e1d077173419f
Author: Andrew Beekhof <andrew at beekhof.net>
Date:   Fri Sep 20 12:20:36 2013 +1000

    Fix: services: Prevent use-of-NULL when executing service actions
	
commit 11473a5a8c88eb17d5e8d6cd1d99dc497e817aac
Author: Gao,Yan <ygao at suse.com>
Date:   Sun Sep 29 12:40:18 2013 +0800

    Fix: services: Fix the executing of synchronous actions

The core can be caused by things such as this missing code:

if (op == NULL) { 
crm_trace("No operation to execute"); 
return FALSE; 

on the beginning of "services_action_sync(svc_action_t * op)" function.

And improved by commit #11473a5.

** Affects: pacemaker (Ubuntu)
     Importance: Undecided
     Assignee: Rafael David Tinoco (inaddy)
         Status: Confirmed


** Tags: cts
-- 
Pacemaker can seg fault on crm node online/standy
https://bugs.launchpad.net/bugs/1368737
You received this bug notification because you are a member of Ubuntu Sponsors Team, which is subscribed to the bug report.



More information about the Ubuntu-sponsors mailing list