[Maas-devel] RFC: "Serialising" power actions

Wed Sep 17 23:27:25 UTC 2014

On Wednesday 17 Sep 2014 09:58:47 Gavin Panella wrote:
> On 17 September 2014 01:09, Julian Edwards <julian.edwards at canonical.com>
> wrote: ...
> 
> > I have severe reservations with the approach discussed, which boil
> > 
> > down to:
> >  * superseding power actions is undesirable
> 
> This may be undesirable because there's more work to do, but I don't
> think it's undesirable in general.

I already explained a few times why it's undesirable.  I'm not being funny, 
but do you have anything concrete to refute my points other than "I think it's 
undesirable" ?

> 
> >  * you cannot rely on cancellation of an outstanding operation (in
> > 
> > what state would it leave the machine?)
> 
> Only cancel an in-progress task when there's something to supersede it,
> and when the final desired state of the superseder is different to the
> in-progress task.

As I explained in the previous email, cancelling can miss important state 
changes that need to happen and you have no idea if it really cancelled or 
not.

> 
> >  * Storing state in the pserv without a means to recover it is a
> > 
> > recipe for disaster
> 
> I guess you mean that a crash or restart in pserv would mean that
> in-progress power commands wouldn't be resumed. That's true, but it's
> not a disaster. It means that for nodes in all states but DEPLOYED we
> need to wait for the periodic power monitor to notice and reissue a
> command (see later; it doesn't do this yet).

So this is a means of recovery that we need, right.  I think it's overloading 
the original intent of the monitor, but it will work.

> For DEPLOYED nodes, sure,
> the command will currently be lost, but these nodes are, one assumes,
> under active management, and some process outside of MAAS will notice,
> be that a human or a Juju or something else.

I think that's a dreadful user experience.  We should not be knowingly 
throwing away user requests.

> 
> > Here's my counter proposal again, which I think is a lot simpler:
> >  1 Already implemented: pserv is dumb and just issues power commands
> > 
> > as requested, with a callback to the region for failure and success.
> > 
> >  2 We do not allow concurrent power operations while an outstanding
> > 
> > one is in progress (ie wait for the callback), although you could
> > detect a request that is the same as the outstanding one and respond
> > without an error.
> > 
> >  3 We add a new column to Node to indicate the desired power state (if
> > 
> > it's different from the current one it indicates an outstanding
> > operation). This has the bonus of being something you can display in
> > the UI.
> 
> We can infer the desired power state from statuses:
> 
>     NEW = off
>     COMMISSIONING = on
>     FAILED_COMMISSIONING = off
>     MISSING = ? (unused status, afaik)
>     READY = off
>     RESERVED = off
>     ALLOCATED = off
>     RETIRED = ? (unused status, afaik)
>     BROKEN = off
>     DEPLOYING = on
>     DEPLOYED = not our business
>     FAILED_DEPLOYMENT = off

DEPLOYED really is our business though.

You cannot infer desired power state from the statuses in a sane manner, it's 
overloading the meaning of status.  For example, some of the states can arise 
through failures and with your scheme it implies we need to turn them off, 
which is not always going to be desirable (what if someone needs to do live 
debugging?)

IOW, why imply it when you can be explicit about it.

> 
> >  4 If the pserv (or its link) goes down, when it comes back up we need
> > 
> > to either re-issue the outstanding power requests or request the
> > current state and correct it as necessary. This is potentially work
> > that can be deferred for now, but it cannot be left out altogether.
> 
> We can infer this from the table above; the periodic power monitoring
> job can be enhanced to enforce this.
> 
> > So in terms of work to do, it's quite easy and quick.
> > 
> > J