[Maas-devel] RFC: "Serialising" power actions
Gavin Panella
gavin.panella at canonical.com
Tue Sep 16 10:22:08 UTC 2014
On 16 September 2014 00:40, Julian Edwards <julian.edwards at canonical.com> wrote:
...
> We need a way to *recover* operations in the case of a pserv and or region
> failure, and to do this the database needs to store the *desired* state of the
> power in addition to its current state. As I have previously said, the pserv
> needs to issue a "recovery" call to the region when it restarts so it can
> converge on the desired state; for power the region would send back a list of
> outstanding power ops on nodes and the desired state for each.
A way to recover operations in the cluster when it fails or is restarted
is desirable, but it's something we can do next cycle, and it's out of
scope for this cycle (and wasn't in scope anyway).
Celery with RabbitMQ, gave us /an/ ability to recover from a restart or
a crash, assuming Celery uses AMQP settlement correctly (which I assume
it probably does).
I don't think it gave us a way to prevent 100s of power-on and power-off
tasks from being queued, it didn't give us a way to make immediate
queries of clusters, and didn't give clusters a way to talk to the
region.
So, with the move to RPC we've lost some of that ability for a cluster
to pick up where it left off, but we have gained a lot more control over
MAAS's infrastructure, and have a strong basis for layering more HA and
HA-like features on top.
I agree that putting "truth" into the database and designing code to
converge on that is the right approach.
(Aside: once a node has been deployed, MAAS can no longer have a desired
power state to converge on. The node belongs to the user at that point,
and he/she has the freedom to turn it off and on as needed, and he/she
can do that by mechanisms other than by MAAS.)
More information about the Maas-devel
mailing list