[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189

Lucas Kanashiro 1890491 at bugs.launchpad.net
Wed Apr 7 14:33:04 UTC 2021

Sorry for taking too long to get to this bug. I have some comments about
the proposed debdiff:

1- The version needs to be updated to 1.1.18-0ubuntu1.4. The .3 version
was already released to bionic-updates.

2- The patches need some DEP-3 headers. I see you are just backporting
the upstream patches but it would be good to also add some headers after
the original commit message, such as Origin, Bug-Ubuntu, Reviewed-By.

3- The patches 0001-Fix-libpe_status-don-t-order-implied-stops-
relative-.patch and 0002-Fix-scheduler-remote-state-is-failed-if-node-
is-shut.patch are in the debdiff but they are not mentioned in
debian/patches/series nor debian/changelog. Should they be removed? Or
added to d/p/series and d/changelog?

The proposed debdiff as-is built fine for me locally. We need to address
the comments above to be able to upload this package. In parallel, we
can update the bug description to add the SRU template (impact, test
plan, where problems could occur), are you willing to do that @Jorge?

Thanks for the work you have done so far!

You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.

  A pacemaker node fails monitor (probe) and stop /start operations on a
  resource because it returns "rc=189

Status in pacemaker package in Ubuntu:
  Fix Released
Status in pacemaker source package in Bionic:
  In Progress
Status in pacemaker source package in Focal:
  Fix Released
Status in pacemaker source package in Groovy:
  Fix Released

Bug description:
  Cause: Pacemaker implicitly ordered all stops needed on a Pacemaker
  Remote node before the stop of the node's Pacemaker Remote connection,
  including stops that were implied by fencing of the node. Also,
  Pacemaker scheduled actions on Pacemaker Remote nodes with a failed
  connection so that the actions could be done once the connection is
  recovered, even if the connection wasn't being recovered (for example,
  if the node was shutting down when the failure occurred).

  Consequence: If a Pacemaker Remote node needed to be fenced while it
  was in the process of shutting down, once the fencing completed
  pacemaker scheduled probes on the node. The probes fail because the
  connection is not actually active. Due to the failed probe, a stop is
  scheduled which also fails, leading to fencing of the node again, and
  the situation repeats itself indefinitely.

  Fix: Pacemaker Remote connection stops are no longer ordered after
  implied stops, and actions are not scheduled on Pacemaker Remote nodes
  when the connection is failed and not being started again.

  Result: A Pacemaker Remote node that needs to be fenced while it is in
  the process of shutting down is fenced once, without repeating

  The fix seems to be fixed in pacemaker-1.1.21-1.el7

  Related to https://bugzilla.redhat.com/show_bug.cgi?id=1704870

To manage notifications about this bug go to:

More information about the Ubuntu-sponsors mailing list