Feature Request: -about-to-depart hook

Tue Jan 27 08:52:49 UTC 2015

On 26 January 2015 at 20:54, Mario Splivalo
<mario.splivalo at canonical.com> wrote:
> Hello!
>
> Currently juju provides relation-departed hook, which will fire on all
> units that are part of relation, and relation-broken hook, which will
> fire on unit that just departed the relation.
>
> The problem arises when we have a multi-unit service peered. Consider
> MongoDB charm where we usually have replicaset formed with three or more
> units:
> When a unit is destroyed (with 'juju remove-unit') first relation-broken
> hook will fire between the departing unit and all the 'staying' units.
> Then, on the departed unit relation-broken hook is fired. But, if we
> need to do some work on the departing unit before it leaves the
> relation, there is no way to do so. When 'relation-departed' hook is
> called there is no way of telling (if we make observation from within
> the hook) if we are running on unit that is departing, or on unit that
> is 'staying' within the relation.
>
> A '-before-departed' hook would, I think, solve. First a
> '-before-departed' hook will be fired on the departing unit. Then
> '-departed' hook will fire against departing and staying units. And,
> lastly, as it is now, the -broken hook will fire.
>
> Ignoring the, most likely, wrong nomenclature of the proposed hook, what
> are your opinions on the matter?

I've been working on similar issues.

When the peer relation-departed hook is fired, the unit running it
knows that $REMOTE_UNIT is leaving the cluster. $REMOTE_UNIT may not
be alive - we may be removing a failed unit from the service.
$REMOTE_UNIT may be alive but uncontactable - some form of network
partition has occurred.

When the peer relation-broken hook is fired, the unit running it knows
that is it leaving the cluster and decomissions itself. However, this
hook may never be run if the unit has failed. Or it may be impossible
to complete successfully (eg. corrupted filesystem).

I agree that this is not rich enough to remove units robustly. The
peer relation-departed hooks are not particularly useful to me, as
they cannot know in advance if the relation-broken hook will complete
successfully. It is the peer relation-broken hook that is responsible
for properly decoupling the unit from the service, and this works fine
if the unit is healthy. The problem is of course if the departing unit
*has* failed, because no subsequent hooks are called to repair the
damaged cluster.

As a concrete example, to remove a cassandra node from a cluster:
 - First, run 'nodetool decommission' on the departing node. This
streams its partitions to the remaining nodes.
 - Second, if 'nodetool decommission' failed or could not be run, run
'nodetool removenode' on one of the other nodes. This removed the
failed node from the ring, and the remaining nodes will rebalance and
rebuild using redundant copies of the data. Data may be lost if stored
with a replication factor of 1 or if updates only waited for an
acknowledgement from 1 node.

An extra hook as you suggest would help me to solve this issue. But
what would also solve my issue is juju leadership (currently in
development). When the lead unit runs its peer relation-departed hook,
it connects to the departing unit and runs the decommissioning process
on its behalf. If it is unable to connect, it assumes the node is
failed and cleans up. It can even notify the remaining non-leader
units that the remove unit has been removed from the cluster, giving
them a chance to update their configuration if necessary. You can't
really do this without the leadership feature, as you can't coordinate
which of the remaining units is responsible for decommissioning the
departing unit (and they would trip over each other if they all
attempted to decommission the departing node).

The edge case in my approach is of course if the departing unit is
live, but for some reason the leader cannot connect to it. Maybe your
inter DC links have gone down. However, there are similar issues with
the extra hook. If your -before-departed hook fails to run, how long
should juju wait until it gives up and triggers the -departed hooks?

Perhaps what is needed here is instead an extra hook run on the
remaining units if the -broken hook could not be run successfully?
Lets call it relation-failed. It could be fired when we know the vm is
gone and the -broken hook was not successfully run.

-- 
Stuart Bishop <stuart.bishop at canonical.com>