Feature Request: show running relations in 'juju status'

Tue Nov 18 08:37:24 UTC 2014

On 18 November 2014 12:23, Ian Booth <ian.booth at canonical.com> wrote:

> On 17/11/14 15:47, Stuart Bishop wrote:
>> On 17 November 2014 07:13, Ian Booth <ian.booth at canonical.com> wrote:
>>
>>> The new Juju Status work planned for this cycle will hopefully address the main
>>> concern about knowing when a deployed charm is fully ready to do the work for
>>> which it was installed. ie the current situation whereby a unit is marked as
>>> Started but it not ready. Charms are able to mark themselves as Busy and also
>>> set a status message to indicate they are churning and not ready to run. Charms
>>> can also indicate that they are Blocked and require manual intervention (eg a
>>> service needs a database and no relation has been established yet to provide the
>>> database), or Waiting (the database on which the service relies is busy but will
>>> resolve automatically when the database is available again).
>>
>> As long as the 'ready' state is managed by juju and not the unit, I'll
>> stand happily corrected :-) The focus I'd seen had been on the unit
>> declaring its own status, and there is no way for a unit to know that
>> is ready because it has no way of knowing that, for example, there are
>> another 10 peer units being provisioned that will need to be related.
>>
>
> You are correct that the initial scope of work is more about the unit, and less
> about the deployment as a whole. There are plans though to address the issue.
> We're throwing around the concept of a "goal state", which is conceptually akin
> to looking forward in time to be able to inform units what relations they will
> expect to participate in and what units will be deployed. They'd likely be
> something like a relation-goals hook tool (to compliment relation-list and
> relation-ids), as well as hook(s) for when the goal state changes. There's
> ongoing work in the uniter by William to get the architecture right so this work
> can be considered. There's still a lot of value in the current Juju Status work,
> but as you point out, it's not the full story.

Ok. If there is a goal state, and I am able to wait until the goal
state is the actual state, then my needs (and amulet and juju-deployer
needs) will be met. It does seem a rather lengthy and long winded way
of getting there though. The question I have always needed juju to
answer is 'are there any hooks running or are there any hooks queued
to run?'. I've always assumed that juju must already know this (or it
would be unable to function), but refuses to communicate this single
bit of information in any way.

>>> So although there are not currently plans to show the number of running hooks in
>>> the first phase of this work, mechanisms are being provided to allow charm
>>> authors to better communicate the state of their charms to give much clearer and
>>> more accurate feedback as to 1) when a charm is fully ready to do work, 2) if a
>>> charm is not ready to do work, why not.
>>
>> A charm declaring itself ready is part of the picture. What is more
>> important is when the system is ready. You don't want to start pumping
>> requests through your 'ready' webserver, only to have it torn away as
>> a new block device is mounted on your database when its storage-joined
>> hook is invoked and returned to 'ready' state again once the
>> storage-changed hook has completed successfully.
>>
>
> Also being thrown around is the concept of a new agent-state called "Idle",
> which would be used when there are no pending hooks to run. There are plans as

That would work too. If all units are in idle state, then the system
has reached a steady state and my question answered.

> well for the next phase of the Juju status work to allow collaborating services
> to notify when they are busy, and mark relationships as down. So if the database
> had it's storage-attached hook invoked, it would mark itself as Busy, mark its
> relation to the webserver as Down, thus allowing the webserver to put itself
> into Waiting. Or, if we are talking about the initial install phase, the
> database would not initially mark itself as Running until its declared storage
> requirements were met, so the webserver would go from Installing to Waiting and
> then to Running one the database became Running.

I'm not entirely sure how useful this feature is, given the inherent
race conditions with serialized hooks. Right now, you need to write
charms that gracefully cope with dependent services that have gone
down without notice. With this feature, you will need to write charms
that gracefully cope with dependent services that have gone down and
the notification hasn't reached you yet. Or if the outage is for
non-juju reasons, like a network partition. The window of time waiting
for hooks to bubble through could easily be minutes when you have a
simple chain of services (eg. postgresql -> pgbouncer -> django ->
haproxy -> apache seems common enough).

Your example with storage is particularly interesting, as I was just
dealing with this yesterday in my rewrite of the Cassandra charm. The
existing mechanism in the charm is broken. If you add a new unit to
the service, it runs its install and configure hooks and is READY. It
then joins the peer relation, and is still READY. The peer units start
spewing data at it, as the replication ring is rebalanced.  We now
have a race. Will the storage hooks fire in time? The new unit unaware
that storage is due to be attached, and does not know that, unless the
storage is attached and the data migrated from local disk soon, the
local disk will fill and the unit will fall over. To solve this with
the current storage-broker subordinate, I could require the operator
to set an 'wait_for_block_storage' boolean in the service
configuration before deploy. But requiring people to read and follow
the documentation is an error prone solution :-( I'm wondering if I
should simply not bother fixing this race, and trust that the block
storage broker hooks will be invoked and completed before local disk
is filled. I understand that work is underway to replace the block
storage broker so it won't be an issue long term, or your goal state
would be useful here if a unit can ask questions like 'is storage
going to be attached' or 'will peers be joining me'.

-- 
Stuart Bishop <stuart.bishop at canonical.com>