agent upgrading

Mon Jun 11 15:34:06 UTC 2012

On Fri, Jun 8, 2012 at 2:07 PM, roger peppe <roger.peppe at canonical.com> wrote:
> Client:
>        - Push new version of tools.
>        - Set new global version number in state.
>
> Machine agent:
>        - Wait for global version to change.

That's slightly different from what we discussed in London. There's a
global setting which contains the recommended version, but the machine
agent is not monitoring it. It is monitoring a flag on its own
settings that tell the version it is supposed to be running. That's
pretty much the same complexity in terms of implementation, but gives
us the freedom to upgrade individual machines independently for
testing purposes.

> I think we should be able to do better than this. The problem is with
> the "exit and let upstart start new version" step - that means that if
> we happen to upload a broken version, then everything instantly breaks
> and needs manually restoring.
>
> Here are some desirable features for an upgrade facility:
>
>        1. Uploading a broken tool shouldn't break anything.
>        2. ... even for a short while.
>        3. We shouldn't rely too heavily on upstart, given the possiblity
>        of ports to systems without upstart.

That feels like pretty poor motivation. 1 and 2 are impossible to
achieve, and 3 is also being misrepresented. It's not about supporting
upstart.. it's about simply dying and coming back up again, which
should be in place no matter what is responsible for starting the
process.

I'd like to bring back the two critical points that were written in
the whiteboard in London when we met a week ago.

These are our short term goals:

1) Permit agents to restart, so that we don't drive development making
silly assumptions about things that can't die

2) Drive development faster, by permitting developers to iterate over
a running environment

This is easy to achieve with the scheme we debated, and as far as I
can see the development of it is a fantastic step to any further
improvements we end up needing down the road, even if that goes into
the direction that you described.

If that's the case, I suggest we move forward with simple steps
towards that. If that's not the case, can we please focus on why
that's not the case rather than proposing another solution from the
ground up?

gustavo @ http://niemeyer.net