[Bug 1789527] Re: Galera agent doesn't work when grastate.dat contains safe_to_bootstrap

Andreas Hasenack andreas at canonical.com
Mon Nov 12 16:49:44 UTC 2018


By doing graceful shutdowns I can get in a state where the last node to
die will have "safe_to_bootstrap:1" in its grastate.dat file. But I
couldn't get that node back running, which was odd, as it should be the
*only* one that can be started. I had to use one of the other initscript
targets, restart-bootstrap, instead of just restart, or else it would
timeout trying to reach the "juju cluster":

2018-11-09 18:54:58 14147 [ERROR] WSREP:
gcs/src/gcs.cpp:gcs_open():1478: Failed to open channel 'juju_cluster'
at 'gcomm://10.0.100.131,10.0.100.191': -110 (Connection timed out)

I see two options here (at least):
a) we backport just what was called the workaround bit, since you say this is what you have been using for a long time now. That is the bit that handles the case where all nodes crashed, and thus "safe_to_bootstrap" is set to zero in all of them. Without the fix, in this case no node will be able to start up. The fix uses the same logic that has been always used to determine the right node to start before "safe_to_bootstrap" existed, and once it finds that node, it just flips that flag to 1 to allow the service to be started
b) we backport the full patch, which consiste of part (a) above, plus skipping the logic to find the right node to start if it finds "safe_to_bootstrap" set to 1. This one will need more testing.

-- 
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1789527

Title:
  Galera agent doesn't work when grastate.dat contains safe_to_bootstrap

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1789527/+subscriptions



More information about the Ubuntu-server-bugs mailing list