Hi all,<div><br></div><div>I'm handling the work to "serialise" power actions — at least, I'm getting started on it right now. I've spent some time looking at the problem and I wanted to bounce ideas off you all — preferably whilst I sleep :)</div><div><br></div>So, the problem:<br><br><div> When a power action is issued to a node (power on, power off, etc.), more than one can be in play for a node at once. We don't keep track of them once they've been fired, except for receiving a notification when they've been successful or failed.</div><div><br></div><div>This means that it's possible to issue two conflicting commands (e.g. power on followed by power off) in quick succession, which can then leave the node in an odd state: it's theoretically possible that the node would stay powered on when MAAS expects it to be off, say if for some reason the power off command got executed first — this is even more likely with AMT BMCs, since there's a degree of did-I-cast-the-runes-right to get a command to work on those, at least when the moon is waning and the wind is from the east.</div><div><br></div><div>There are, so far as I can tell, two strategies for handling this problem properly. Both of them require keeping track of the current power action for a node, and both assume that only one action can run at once:</div><div><br></div><div>1: The current power action blocks all others until it as completed. Other power actions will be queued and executed in turn. <br></div>- or -<br><div>2: Each power action supersedes any action that is currently executing — the existing action is cancelled and then the new action is run.</div><div>- or -</div><div>3. We track the current ("now") and "next" actions for the node, but drop every action that comes in once those two slots are full.</div><div><br></div>At first glance the second option is simpler — just cancel whatever's there and then do our thing. But I think that it's actually a bit deceptive. Consider:<div><br></div><div> - How do we "cancel" an action? </div><div> - How do we ensure that we're not going to end up in an inconsistent state if the node is already responding to action #1 when we cancel it?</div><br><div>The first option isn't without its problems either — having a queue of actions seems kind of awkward, and could lead to flip-flopping of a node's power state. But *not* having a queue could still lead to situations where several actions get issued in quick succession.</div><div><br></div><div>The third option seems to offer a happy medium. We can track the current and next power actions for a node and then ignore anything else that comes in whilst both of those two slots are full. Each action must succeed or fail before the next one can be executed. This means we won't get potentially ridiculous amounts of flip-flopping, and we build this pretty easily. We'd have to have some kind of UI feedback for "hey, it looks like you're repeatedly powering this node on and off; I'm going to ignore you for a while," but that doesn't seem all that onerous.</div><div><br></div>So as it stands I'm leaning towards option #3. Questions, thoughts and comments are welcome.<div><br></div>~gmb