New modified hook semantic

Fri Mar 18 20:15:30 UTC 2011

Greetings Ensemblers,

A very good discussion came up today involving Clint, and then
everyone else in our standup call.  It regards the semantics of the
modified hook, which is our most important hook at the moment.

Initially, I've tried to explain to Clint the reasons why it worked
the way it does, and Clint agreed there was a nice background to it,
but given some further consideration, I think we ought to reevaluate
this decision for a very simple reason: *everyone* that wrote formulas
so far got the semantics wrong.  We can't expect any better indication
than this that these details should be revisited.

With that in mind, I present here an analysis of the problem, and a
proposal for an alternative version that more closely matches the gut
feeling everyone had of how it was working.  I also propose that we
start working on that right away, since we don't want formulas to
proliferate with the wrong semantics.

== Current approach ==

Right now, the relation-changed hook can be executed in three different moments:

- Unconditionally, when the relation is joined
- When the relation is modified after the hook was last executed
- When the relation is departed

We make the distinction between the three kinds of execution by
setting an environment variable named $ENSEMBLE_CHANGE which may be
set to "joined", "modified", or "departed".

There are a few different reasons why we selected this model.  First,
it enables one to more easily share logic between the three events.
Then, it also means the hook is executed only once per state of the
relation, and is guaranteed to execute again when it changes.  Also,
having both joined and modified within the same hook means one can
ignore the flag in certain cases, and simply do the same thing every
time, based on what data is available rather than what caused the
event.  This tends to be a nice approach because there's a possibility
that a relation is joined with pre-existing members, so this doesn't
have to be special-cased.

The confusion with this approach, surprisingly, comes from the fact
that joined and modified are very similar.  That's surprising because
one can easily imagine that having a single consistent logic would be
easier to understand than having two hooks with distinct behavior.

The real problem, though, is that many relations consist of one side
providing data and the other side consuming it.  In those cases, the
natural thinking goes like: "On the provide side, I do a relation-set
on joined.  On the consuming side, I *ignore* joined, since I'm just
consuming, and consume the data when it is modified.".  This breaks
the rules provided above, because the joined event can actually
*already* have the data, in which case the modified hook isn't called
again since the data is not changed further.

This problem is worsened by the fact that within test environments it
may very well work fine, since the load is low.  In this case, the
joined event tends to execute so quickly on both sides that any
relation-set performed will cause a follow up modified event to be run
on the remote side.  This, of course, creates the worse of problems:
people will notice broken things only when they use them for real.

== Proposal ==

To address these problems, I propose we change the mechanism of change
notification in the following way:

1) Split relation-changed into three hooks: relation-joined,
relation-changed, and relation-departed

2) relation-joined and relation-departed map exactly to the respective
events which used to happen with relation-changed.

3) The new relation-changed hook is called *unconditionally* right
after relation-joined, and then any time data changes since the last
execution (with the current behavior related to caching preserved).

4) $ENSEMBLE_CHANGED is not passed to any hook anymore.

This preserves the benefit and simplicity of the earlier approach, and
solves the problem for the following reason: it'll be very clear to
the formula author that relation-changed is called unconditionally
after joined, which means the data may or may not be available at that
time.  Also, there will be no way to *avoid* executing within
relation-modified by doing the equivalent of 'if $ENSEMBLE_CHANGE !=
"joined"', so the author is forced to take into account the
availability of the data.

== Implementation ==

It seems like the implementation is going to be very simple.  To
achieve the above, we'll have to perform the following fine-grained
tasks:

1) Remove the logic which passes $ENSEMBLE_CHANGED to the
<name>-relation-changed hook.

2) Introduce a <name>-relation-departed hook, and execute it when the
relation is departed (logic is copied from relation-changed).

3) Introduce a <name>-relation-joined hook, and execute it when the
relation is joined (logic copied from relation-changed too). Note that
this hook must be called *before* <name>-relation-changed.

4) Stop running the <name>-relation-modified hook on the *departed*
event.  No change is needed for joined, since it should still run.

And that's it, really.

What do you think?

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter