Stabilizing how juju deploys itself.

Fri Mar 2 19:37:27 UTC 2012

In recent weeks, it has become clear to me that juju needs to take
a different approach to the way it deploys itself. Several of us
have started running juju for real services, and so have long lived
environments, and it has exposed some problems.

Currently, there are three methods that juju uses to install juju
onto machines that it provisions. These are all controlled by the
environments.yaml option 'juju-origin'. They are:

distro: This basically just adds juju to the list of packages to install
        on first boot.

ppa: this is like distro, but also makes sure the juju ppa (ppa:juju/pkgs)
     is enabled first.

branch: This takes a bzr branch url and extracts it, then installs juju
        in a "developer" mode.

Ironically, the most stable option is branch. This is because one can
branch juju at the version they want to use for repeated deployments,
and use that one, and not have to worry about incompatible updates to
the PPA or their client.

If one chooses distro, then this scenario is problematic:

* Client is on oneiric - juju r398
* default-series is precise juju r457
* juju bootstrap is running r398 of juju, and tries to start the agents
  in an incompatible way. But because juju-origin: distro was used, r457
  is installed on the bootstrap node. FAIL.

The reverse can happen too. If the client is precise, and the default
series is oneiric, then r457 will be used to provision the first node,
but then r398 will be installed on all successive nodes.

If one chooses PPA, this scenario happens:

* today - bootstrap with r467
* today - deploy with r467
* tomorrow - juju is updated with a new ZOOKEEPER schema change
* next day - add-unit with new version of juju has potential unintended consequences.

I know that in the past backward incompatible changes have been waved
off as rare, but at this point, we need a *plan* for handling them. It
has happened a few times in the last year, so it seems a regular if not
intended part of life for the development of juju. To try any harder
than we already do to stop them is to stifle juju's development.

Also the moving target that one gets from ppa is really undesirable. The
distro target is also completely erratic because it deploys old and new
releases of juju into the same environment, so that is going to end in
failure as more release of Ubuntu come along with juju support and the
versions that are available diverge more and more.

So, here's my suggestion:

* Bootstrap stores a juju binary: This may have to be fixed when juju
  is no longer Architecture: all, but for now, this should work fine.
  For all of the origins, instead of inserting instructions on where
  to fetch the binary based on the origin, we fetch the binary then
  and there and store it in file storage just like we would a charm on
  deploy. The provisioning step then needs to have a file storage url
  to fetch the binary from and instructions how to install it, which
  should be just as straight forward as the current methods.

* A new sub-command, update-environment, is used to insert new versions
  of this binary into the environment. It would not do anything to the
  nodes. Provisioning agents would be left alone, so this new version
  would not roll out until machine 0 was upgraded. Update is used to
  match the apt paradigm that you're just updating the available version,
  not actually upgrading it.

* New sub-command upgrade-machine used to roll out new versions of juju
  to the components of the environment. Perhaps allow a '--all' flag to
  do mass upgrades when the admin is comfortable with that.

This would also solve this bug:

https://bugs.launchpad.net/juju/+bug/926550

Because you'd be able to control what version of juju is used for each
component, allowing verification of updates.

Comments? Should this be proposed as an official spec?