Feature Request: show running relations in 'juju status'

William Reade william.reade at canonical.com
Wed Nov 19 12:17:41 UTC 2014


On Tue, Nov 18, 2014 at 7:19 AM, Kapil Thangavelu
<kapil.thangavelu at canonical.com> wrote:
>
> for clusters... its not a question of futures but being informed of known
> unit count to establish quorum. ie 1 to 3 or n+1. leader election helps, but
> actually knowing the unit count is critical to being able to establish a
> clear state without throwing away data (aka race on peer knowing quorum and
> leader) as adhoc leader election has to throw away data from non leaders who
> may already be serving clients due to lack of quorum knowledge.
>  ...
> status per future impl helps, as does explicitly marking units.. but pending
> cluster count is a missing and important property to properly establish
> quorum in a peer rel from one to n that is only resolved by knowing recorded
> units count for a svc.

Two things:

1) I'm not sure numbers are good enough in general compared to sets of units.

2) I'm not sure a single number|set gives us all the information we need.

Leaving (1) aside for now, I *think* that each of the following
numbers|sets is potentially relevant:

i) "goal": the units that juju expects to be part of the relation once
everything's converged (including those not yet running, and not
including those that are dying)
ii) "active": the units that are in scope for the relation (but might be dying)
iii) "current": the units that are *locally known to be* in scope for
the relation

Today, we only expose "current" -- ie relation-list returns the
"current" units, and it might be a complete lie, but it's a consistent
lie that is useful for many purposes, and we're not going to remove
it.

If we expose "active" but not "goal", we don't help anything very much
-- the first unit of a cluster to come up will still think it's alone
in the world, and we still have all the original problems.

If we expose "goal" but not "active", we create new problems when we
try to scale: going from 1 unit to 3 puts that first unit in an
apparent minority, and is thus likely to effectively take the whole
service down.

So: I think we definitely need to expose both "goal" and "active"
information. The interesting question is whether we can just expose
numbers, or whether we need to expose actual sets of units (as we do
for "current")... and I *think* we need sets, not just numbers,
because:

u/0:current=[u/1,u/2]
u/0:active=[u/3,u/4]
u/0:goal=[u/3,u/4]

...is legitimate, when 3,4 were created a while ago (and have just
come up, but 0 has not yet run their joined hooks) and 1,2 were *just*
destroyed (and have themselves left scope, but 0 has not yet run their
departed hooks)...

...and if all we expose is numbers, there's no way for the charm to
tell the difference between that state and a stable 0,1,2 cluster (or
any of the other combinations with sets of the given sizes...) at
least until more hooks fire.

*Maybe* this doesn't matter, but I'm loath to assume that it *never*
matters. Thoughts?

Cheers
William



More information about the Juju-dev mailing list