relation-created and health checks

Mon Aug 4 16:41:42 UTC 2014

Hey all,

I could not find any documentation for that, so I'm sending it to the
list before the context gets lost and becomes unsearchable again.

We were talking today about health checks in the context of a unit,
and my proposal is to have three fundamental fields as part of the
unit state:

  1. Manual action required: yes/no
  2. Quality of service: 0-100%
  3. Summary: <human-oriented-message>

So, a few examples might be:

A. Unit is still being deployed:

  1 == No
  2 == 0%
  3 == "Machine being allocated" / "unit being created" / etc

B. Unit deployed, waiting for relation

  1 == Yes
  2 == 0%
  3 == Waiting for database relation

C. Unit deployed, relation created, database being created

  1 == No
  2 == 0%
  3 == Waiting for database to be available

D. Unit deployed and ready to use

  1 == No
  2 == 100%
  3 == Ready.

E. Unit ready, some HTTP requests to it failing with 500

  1 == No
  2 == 80%
  3 == 200 out of 1000 HTTP requests failed.

Mark Ramm correctly points out that for use case C to work correctly,
we need to notify the unit immediately that a database relation was
created but wasn't joined. I naively assumed that this was already in
place, but William corrected me and investigating artifacts indeed the
idea was never put in place, even in the prior Python code. We do have
its counterpart, though:  relation-broken is called not when the
relation is departed, but when it is indeed terminated. A
relation-created hook was envisioned early on, but we decided to wait
until it was indeed useful, since we could not see the clear use case.
Now we do have it: it's a good point to say "I'm not waiting for
manual actions anymore" in the scheme suggested above, so I suggest
introducing it.

gustavo @ http://niemeyer.net