Feature Request: -about-to-depart hook

Wed Jan 28 14:03:00 UTC 2015

On 01/27/2015 09:52 AM, Stuart Bishop wrote:
>> Ignoring the, most likely, wrong nomenclature of the proposed hook, what
>> are your opinions on the matter?
> 
> I've been working on similar issues.
> 
> When the peer relation-departed hook is fired, the unit running it
> knows that $REMOTE_UNIT is leaving the cluster. $REMOTE_UNIT may not
> be alive - we may be removing a failed unit from the service.
> $REMOTE_UNIT may be alive but uncontactable - some form of network
> partition has occurred.

$REMOTE_UNIT doesn't have to be the one leaving the cluster. If I have
3-unit cluster (mongodb/0, mongodb/1, mongodb/2), and I 'juju remove
mongodb/1), the relation-departed hook will fire on all three units.
Moreover, it will fire twice on mongodb/1. So, from mongodb/2
perspective, $REMOTE_UNIT is indeed pointing to mongodb/0, which is, in
this case, leaving the relation. But if we observe the same scenario on
mongodb/0, $REMOTE_UNIT there will point to mongodb/0. But that unit is
NOT leaving the cluster. There is no way to know if the hook is running
on the unit that's leaving or is it running on the unit that's staying.

> When the peer relation-broken hook is fired, the unit running it knows
> that is it leaving the cluster and decomissions itself. However, this
> hook may never be run if the unit has failed. Or it may be impossible
> to complete successfully (eg. corrupted filesystem).

Precisely! So any cleanup work that eventually need to be done can't be
run, because -broken might not be available at all.

> An extra hook as you suggest would help me to solve this issue. But
> what would also solve my issue is juju leadership (currently in
> development). When the lead unit runs its peer relation-departed hook,
> it connects to the departing unit and runs the decommissioning process
> on its behalf. If it is unable to connect, it assumes the node is
> failed and cleans up. It can even notify the remaining non-leader
> units that the remove unit has been removed from the cluster, giving
> them a chance to update their configuration if necessary. You can't
> really do this without the leadership feature, as you can't coordinate
> which of the remaining units is responsible for decommissioning the
> departing unit (and they would trip over each other if they all
> attempted to decommission the departing node).

Let me explain a use-case from mongodb charm and the workaround I had to
'undertake'.
For mongodb leadership is not going to help me - mongodb units talk to
themselves all the time and they can change state without juju knowing.
So basically, when doing any administrative work with juju the first
thing that is checked is weather the unit is primary unit (as in MongoDB
you can't run administration commands on non-primary units). If it's
not, no action takes place. If it is, whatever needs to be done is, well
done :) So when a new unit joins relation relation-joined will fire
across all the peered units. But only the unit which is primary will
actually issue 'rs.add()'.

When removing unit from replicaset the procedure is the same. The edge
case here is when operator tries to remove unit which is PRIMARY. As
MongoDB doesn't support that, unit first needs to be stepped down (which
forces re-election for the new primary). Then it checks for the new
primary and asks it to remove itself.

But, if that takes place in relation-departed, there is no way of
knowing if you need to do a stepdown, because you don't know if you're
the unit being removed, or is it the remote unit being removed.
Therefore the logic for removing nodes had to go to relation-broken.
But, as you explained, if the unit goes down catastrophically the
relation-broken will never be executed and I have a cluster that needs
manual intervention to clean up.

> The edge case in my approach is of course if the departing unit is
> live, but for some reason the leader cannot connect to it. Maybe your
> inter DC links have gone down. However, there are similar issues with
> the extra hook. If your -before-departed hook fails to run, how long
> should juju wait until it gives up and triggers the -departed hooks?

This is a good point to make. I think it doesn't matter, the other hooks
don't need to care about that, and on departing unit you can always
serialize the order of execution of hooks if you need.

> Perhaps what is needed here is instead an extra hook run on the
> remaining units if the -broken hook could not be run successfully?
> Lets call it relation-failed. It could be fired when we know the vm is
> gone and the -broken hook was not successfully run.

I'm not sure if this is possible... Once the unit left relation juju is
no longer aware of it so there is no way of knowing if -broken completed
with success or not. Or am I wrong here?

	Mario