machine placement spec

William Reade william.reade at canonical.com
Wed Nov 9 16:24:38 UTC 2011


Hi all

Here's a first attempt at a spec for machine placement. I'd be very
grateful for your comments, and opinions on which (if any) of the
described enhancements is valuable enough to be promoted to the main
part of the spec.

It's available at docs/source/drafts/placement.rst in
lp:~fwereade/juju/placement-spec; pasted below for convenience.


Machine Placement
=================


Introduction
------------

At present, juju offers extremely limited tools to those who care what
sort of hardware their services are run on: only the EC2 provider can
interpret such requests, and they're of a very limited nature. In
addition, juju offers no mechanism for placing unbound services on the
same hardware, and this has been requested often.

This is an especially painful constraint for those running against
orchestra in a small data centre with nonhomogenous machines; at this
end of the spectrum, administrators are much more likely to want to
control service placement down to the level of individual machines,
which is impossible within juju. However, even on EC2 some services have
different requirements to others, and the existing
`default-instance-type` environment setting is almost entirely
inadequate.

This feature will need to cleanly expose different sets of capabilities
across different providers.


Constraints
-----------

We propose to introduce the concept of "machine constraints", which can
be used to encode an administrator's hardware requirements. Constraints
can be set for environments, services, and service units, with lookups
for each key falling back from more specific to more general settings
[0]_.

Changes to constraints cannot affect any unit that's already been
deployed. Constraints can be set in the following circumstances:

* (environment) when editing `environments.yaml`
* (service) at `juju deploy` time
* (unit) at `juju add-unit` time
* additionally, service and environment values should be subsequently
editable with `juju set`, but I understand there's some work to do
before it'll work with environments.

When specified on the command line, each individual constraint is
signalled with `--constraint` or `-c` followed by a `key=value` pair.
When `value` is empty, the juju default value is used; this can serve to
"escape" unwanted constraints specified at a higher level.

The constraints will fall into three categories:

* Generic constraints: based on the minimal properties exposed by every
provider. These are:

  * `cpu`: The minimum desired processing power of the machine, measured
in `ECU
http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud#Elastic_compute_units`_, defaulting to 1; any floating point number >= 0.0 is valid.
  * `ram`: The minimum desired memory for the machine, defaulting to
512MB; any floating point number >= 0.0 and suffixed with M, G or T is
valid.
  * `storage`: The minimum desired persistent disk space for the
machine, defaulting to 4GB. Valid inputs as for `ram`.

  In all cases above, the value `0` is treated specially, and used to
denote that the value doesn't matter at all. EC2 could, for example,
special-case `storage=0` to specify a non-EBS image, and `cpu=0` to
allow for deploying to a t1.micro without explicitly

* Provider constraints: based on the additional properties exposed by a
given provider, which are not expected to be directly portable to other
providers:

  * For EC2, we expect to expose:

    * `ec2-zone`: availability zone within the region (which is an
environment-level setting, and will remain as such for now). Defaults to
"a"; valid values depend on the EC2 region setting.
    * `ec2-instance-type`: instance type. Defaults to "m1.small"; valid
values are those machine types that EC2 has available in the given
region.

  * For orchestra, we expect to expose:

    * `orchestra-name`: to allow admins to specify a specific instance
by name, if required. Unset by default; the valid values are the set of
instance names exposed by cobbler.
    * `orchestra-classes`: a space-separated list of cobbler
mgmt-classes to which the instance must belong. Empty by default; the
valid values are the existing cobbler mgmt-classes, excluding the
current values of `available-mgmt-class` and `acquired-mgmt-class` (from
the orchestra environment).

  Provider constraints are only valid when used with the appropriate
provider, and cause errors when specified with a different provider.

* Override constraints: for determining machine placement in terms of
existing juju components:

  * `place-in=<machine-id>`: On a separate container in the machine with
juju id `machine-id`. Only valid if the machine exists (in juju state),
and is not already holding a unit of the requested service. If
additional constraints are specified, the target machine will be checked
against those constraints; if it doesn't satisfy them, the operation
will fail. Inherited constraints from higher levels will not be taken
into account.

    This constraint cannot be set on a service or in an environment; if
it's specified at `juju deploy` time, all constraints will be applied
only to the automatic first unit. In this instance, any desired
service-level constraints must be specified later with `juju set`.

  * `place-with=<service-name>`: On a separate container in *any*
machine with a unit of `service-name` deployed. Only valid if at least
one such machine is not already running a unit of the requested service.
If additional constraints are specified, the possible target machines
will be filtered against those constraints; if no available machine
satisfies them, the operation will fail. Inherited constraints from
higher levels will not be taken into account; however, if a service has
specified `place-with` (and it is not unset at the unit level), the
natural combination of the service-level and unit-level constraints will
be used.

    This constraint cannot be set on an environment, but is meaningful
at both the service and unit levels.

Please note that, for the local LXC provider, given its nature and its
intended use as a development tool, no constraints will have any effect
whatsoever.


Dependencies
------------

* We need orchestra to expose `cpu`, `ram` and `storage`; and ideally,
in case we end up with megamachine orchestra deployments, an API which
will efficiently determine one of the (equal) least-fit machines for a
set of constraints and return its cobbler instance ID; we don't really
want to have to grab all the data ourselves and iterate. Having access
to the data is critically important; efficient access is not critical,
but would be extremely helpful.

* We need to extend `juju set` to allow for (1) environment changes,
which could be a moderately large change, and (2) service changes that
aren't actually part of the service config, and don't need to be
communicated to units. This could be ignored, and most of the spec could
still be implemented, but it would be rather inconvenient for users to
have immutable service-level or environment-level constraints.

* We need the ability to deploy units to separate LXC containers within
individual machines. If that isn't done, the override constraints
*could* still be developed, but only on the understanding that the
services could potentially interfere with one another, and that the
risks of this being a serious problem are low enough to justify the
feature's inclusion.


Potential Enhancements
----------------------

The above spec is intended to be small and focused; several enhancements
are possible and may be desirable now or in the future. They include:

* An additional "generic" `gpu` constraint, defaulting to `0`, allowing
us to generically specify a cg1.4xlarge, and giving us the possibility
of extending orchestra to expose this as well. Not sure how we'd measure
GPU power.

* Additional provider constraints, including (surely non-exhaustive;
please contribute ideas):

    * `ec2-image-id`: image ID. Will need to be used with care; could
potentially silently override `storage`, and may have an unexpected OS
(which would normally be determined by the service's charm).
    * `orchestra-dns-name`: potentially useful for sysadmins who don't
primarily think of their systems by `orchestra-name`.

* Max constraints: allow generic constraints to also take the value
`max`, meaning "the best available". (If you specify `cpu=max` and
`storage=max`, the constraints cannot be satisfied unless the available
machine with the (equal) greatest amount of storage also has the (equal)
most processing power.)

* An additional override constraint:

  * `scale-with=<service-name>`: On a separate container in *every*
machine running a unit of `service-name`; henceforth every add or remove
of a unit of `service-name` will lead to the addition or removal of the
corresponding unit of the requested service. If `service-name` is
destroyed, the requested service will not be, but the only running units
will be any that were deployed separately from the scale-with request.
Only valid if the requested service is not already scaling with
`service-name`.

* `--with` and `--in` syntactic sugar for `place-with` and `place-in`
constraints.

* Provider->generic constraint translation: In the event of our
implementing stacks, it may be useful to be able to convert
provider-specific constraints into generic ones (where possible) to
facilitate creation of provider-independent stacks.

* Roles: named groups of constraints (better name than "roles"?). Useful
for OAOO-ness and reduced typing when scripting or running from the
command line; also useful for recording intended machine characteristics
when not otherwise translatable (for example, when serialising a
deployment as a stack, it would be thoughtful to include a
`fast-network` role to hold the `orchestra-classes=rack-c` constraint
which *you* know means "next to the switch" but is meaningless to people
outside your datacentre).


.. [0] So: if an environment has specified `ec2-zone=a` and `ram=1G` and
a service has specified `ram=2G`, instances of that service in that
environment will inherit the `ec2-zone` setting of "a", and the juju
default `storage` of "4G"; and if a unit of that service additionally
specified `cpu=64`, it would inherit all aforementioned constraints as
well.





More information about the Juju mailing list