[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
1890491 at bugs.launchpad.net
Wed Apr 7 14:33:04 UTC 2021
Sorry for taking too long to get to this bug. I have some comments about
the proposed debdiff:
1- The version needs to be updated to 1.1.18-0ubuntu1.4. The .3 version
was already released to bionic-updates.
2- The patches need some DEP-3 headers. I see you are just backporting
the upstream patches but it would be good to also add some headers after
the original commit message, such as Origin, Bug-Ubuntu, Reviewed-By.
3- The patches 0001-Fix-libpe_status-don-t-order-implied-stops-
relative-.patch and 0002-Fix-scheduler-remote-state-is-failed-if-node-
is-shut.patch are in the debdiff but they are not mentioned in
debian/patches/series nor debian/changelog. Should they be removed? Or
added to d/p/series and d/changelog?
The proposed debdiff as-is built fine for me locally. We need to address
the comments above to be able to upload this package. In parallel, we
can update the bug description to add the SRU template (impact, test
plan, where problems could occur), are you willing to do that @Jorge?
Thanks for the work you have done so far!
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
A pacemaker node fails monitor (probe) and stop /start operations on a
resource because it returns "rc=189
Status in pacemaker package in Ubuntu:
Status in pacemaker source package in Bionic:
Status in pacemaker source package in Focal:
Status in pacemaker source package in Groovy:
Cause: Pacemaker implicitly ordered all stops needed on a Pacemaker
Remote node before the stop of the node's Pacemaker Remote connection,
including stops that were implied by fencing of the node. Also,
Pacemaker scheduled actions on Pacemaker Remote nodes with a failed
connection so that the actions could be done once the connection is
recovered, even if the connection wasn't being recovered (for example,
if the node was shutting down when the failure occurred).
Consequence: If a Pacemaker Remote node needed to be fenced while it
was in the process of shutting down, once the fencing completed
pacemaker scheduled probes on the node. The probes fail because the
connection is not actually active. Due to the failed probe, a stop is
scheduled which also fails, leading to fencing of the node again, and
the situation repeats itself indefinitely.
Fix: Pacemaker Remote connection stops are no longer ordered after
implied stops, and actions are not scheduled on Pacemaker Remote nodes
when the connection is failed and not being started again.
Result: A Pacemaker Remote node that needs to be fenced while it is in
the process of shutting down is fenced once, without repeating
The fix seems to be fixed in pacemaker-1.1.21-1.el7
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1704870
To manage notifications about this bug go to:
More information about the Ubuntu-sponsors