agent upgrading

Sun Jun 10 23:54:44 UTC 2012

SGTM.

This is similar (in fact better) than the nginx binary upgrade model, http://wiki.nginx.org/NginxCommandLine#Upgrading_To_a_New_Binary_On_The_Fly.

Be aware of http://code.google.com/p/go/issues/detail?id=1435

Cheers

Dave

On 09/06/2012, at 3:07 AM, roger peppe wrote:

> We'd like to be able to upgrade a running juju with new software.
> This is a scheme I originally came up with for upgrading
> minor (database-compatible) versions, somewhat modifed after
> discussion with Gustavo:
> 
> Client:
> 	- Push new version of tools.
> 	- Set new global version number in state.
> 
> Machine agent:
> 	- Wait for global version to change.
> 	- Download new version.
> 	- Copy version to where the local agents can see it
> 	- Set new version in local agents' state.
> 	- Point "current" symlink to new version
> 	- Exit and let upstart start new version
> 
> Other agent (provisioning or unit agent):
> 	- Wait for agent's version to change.
> 	- Point "current" symlink to new version
> 	- Exit and let upstart start new version
> 
> I think we should be able to do better than this. The problem is with
> the "exit and let upstart start new version" step - that means that if
> we happen to upload a broken version, then everything instantly breaks
> and needs manually restoring.
> 
> Here are some desirable features for an upgrade facility:
> 
> 	1. Uploading a broken tool shouldn't break anything.
> 	2. ... even for a short while.
> 	3. We shouldn't rely too heavily on upstart, given the possiblity
> 	of ports to systems without upstart.
> 
> Obviously point 1 is not entirely attainable - a tool can be broken in
> any number of subtle ways that are not quickly detectable.
> 
> However, if we each tool does a set of checks at startup time (checking
> the version in the zk database and other dependencies) then I think that
> the likelyhood of breakage can be drastically reduced.
> 
> Here is a proposal for a scheme that addresses the above three points. It
> remains the same as the original scheme, with the exception instead of
> letting upstart start the agents directly, we interpose a intermediary,
> say "upgrader". This tool would be designed to be small, well verified
> and with minimal dependencies - designed to need upgrading very seldom.
> 
> The final "exit and let upstart start new version" step is replaced with
> the following upgrade path.
> 
> 	- The agent asks the upgrader to run a new version of the agent
> 	by passing it the name of an executable and arguments.
> 
> 	- The upgrader starts the new agent.
> 
> 	- The new agent connects to the state, does whatever verification
> 	is necessary, and notifies the upgrader that it has successfully
> 	started (but doesn't actually *do* anything yet).
> 
> 	- The upgrader notifies the original agent that the upgrade has
> 	been successful.
> 
> 	- The original agent shuts down, notifies the upgrader that it
> 	has done so, and exits.
> 
> 	- The upgrader notifies the new agent that it can continue,
> 	picking up the work where the old agent stopped.
> 
> A particular advantage of this scheme is that an upgraded agent
> will cause no down time, even if the new agent hangs for a long
> time when starting.
> 
> If the agent exits without upgrading, then the upgrader tool
> will also exit, leaving upstart (or similar mechanism) to restart it;
> this provides a way to upgrade the upgrader itself.
> 
> One possible drawback is that the new and the old
> agent running side-by-side might be a problem in
> resource-constrained environments. I don't think this would
> be a problem in practice (most resources will probably be taken
> as the agent continues to run, rather than at initialisation time),
> and we can work around it if necessary, by having a way to
> tell the upgrader to run the programs sequentially, and
> back off to the previous version if the upgraded version fails.
> 
> I have a prototype of this proposal here (WIP, untested as yet):
> 
> 	https://codereview.appspot.com/6307061
> 
> The upgrader in this implementation uses stdin and stdout to talk to
> its child processes;
> another mechanism could be substituted if desired.
> 
> Major Version Upgrades (sketch)
> 
> Major version (database) upgrades can fit into the above scheme by
> providing an additional synchronisation step. The client could give the
> global version a "pending" tag and wait for all agents to indicate that
> they are halted. Then it would upgrade the database, untag the version
> and let the upgrade proceed as usual.
> 
> -- 
> Juju-dev mailing list
> Juju-dev at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev