machine placement spec

Gustavo Niemeyer gustavo.niemeyer at canonical.com
Thu Nov 10 15:18:48 UTC 2011


Thanks a lot for capturing this William. Great work indeed.

Here are some follow ups to try to help shaping it up:

> This is an especially painful constraint for those running against
> orchestra in a small data centre with nonhomogenous machines; at this
> end of the spectrum, administrators are much more likely to want to
> control service placement down to the level of individual machines,
> which is impossible within juju. However, even on EC2 some services have

Right, I don't see orchestra as being special in that regard. Any real
deployment on EC2 will want to tweak the machines on which services
will land.

> different requirements to others, and the existing
> `default-instance-type` environment setting is almost entirely
> inadequate.

I'd not even mention that option. It was knowingly born as a hack that
would have to die, rather than as a planned option.

> We propose to introduce the concept of "machine constraints", which can
> be used to encode an administrator's hardware requirements. Constraints
> can be set for environments, services, and service units, with lookups

We've agreed to keep unit-level constraints out for the moment at the
end of the sprint, so it'd be nice to leave them out of the spec as
well, so we can avoid getting into details on that. No matter what
happens, we have to get service-level constraints right, and the
introduction of unit-level constraints down the road, if it turns out
to be important, should be made without compromising good behavior of
service-level constraints.

> Changes to constraints cannot affect any unit that's already been
> deployed. Constraints can be set in the following circumstances:
>
> * (environment) when editing `environments.yaml`

That's not the right place. Environment settings should be in the
environment configuration, inside ZooKeeper. We've already went too
far on the environments.yaml hacks, and are already getting problems
out of it. This file is supposed to offer the identity of the
environment so that communication with it can take place, but nothing
else.

> * (unit) at `juju add-unit` time

Ditto regarding unit-level placement.

> * additionally, service and environment values should be subsequently
> editable with `juju set`, but I understand there's some work to do
> before it'll work with environments.

"juju set" changes configuration of services. Not sure if that's the
right place for that.

> When specified on the command line, each individual constraint is
> signalled with `--constraint` or `-c` followed by a `key=value` pair.

What about multiple constraints? Spaces?

> When `value` is empty, the juju default value is used; this can serve to
> "escape" unwanted constraints specified at a higher level.

s/escape/ignore/

>  * `cpu`: The minimum desired processing power of the machine, measured
> in `ECU

Cool.

>  * `ram`: The minimum desired memory for the machine, defaulting to
> 512MB; any floating point number >= 0.0 and suffixed with M, G or T is
> valid.

I don't think we need floats here.

>  * `storage`: The minimum desired persistent disk space for the
> machine, defaulting to 4GB. Valid inputs as for `ram`.

I suggest keeping this out of the current interaction. This is a
nebulous area with interactions with the storage-specific features and
that won't affect the final outcome of the key design in this
document.

>  In all cases above, the value `0` is treated specially, and used to
> denote that the value doesn't matter at all. EC2 could, for example,
> special-case `storage=0` to specify a non-EBS image,

... and that's a good example of that point. This feels very magical
and unclear. Let's take it down so we don't have to debate about this
now.

> and `cpu=0` to
> allow for deploying to a t1.micro without explicitly

Sentence is unfinished, but I guess it sounds reasonable.

> * Provider constraints: based on the additional properties exposed by a
> given provider, which are not expected to be directly portable to other
> providers:
>
>  * For EC2, we expect to expose:
>
>    * `ec2-zone`: availability zone within the region (which is an
> environment-level setting, and will remain as such for now). Defaults to
> "a"; valid values depend on the EC2 region setting.

Why is it an environment-level setting?

>    * `orchestra-name`: to allow admins to specify a specific instance
> by name, if required. Unset by default; the valid values are the set of
> instance names exposed by cobbler.

Can we please keep that out? There's nothing that this can achieve
that is not doable more generically by orchestra-classes.

>    * `orchestra-classes`: a space-separated list of cobbler

Comma-separated?  How do we specify multiple constraints at once?

E.g. what does that mean: -c a=b,c d=e,f

>  Provider constraints are only valid when used with the appropriate
> provider, and cause errors when specified with a different provider.

That's not what we agreed to, I believe. We said we'd ignore them when
used with the wrong provider, so that scripts and etc won't break.

> * Override constraints: for determining machine placement in terms of
> existing juju components:

The term "override" didn't ring a bell for me.in this context. None of
this word's definitions seems related to the meaning of the options
below.

>  * `place-in=<machine-id>`: On a separate container in the machine with
> juju id `machine-id`. Only valid if the machine exists (in juju state),
> and is not already holding a unit of the requested service. If

Can we please keep that out as well for now, or rather put it into a
future ideas without detailing its semantics?  This is related to
multi-unit machines that is not supported right now, and is worth some
conversation on itself as the number of exceptions you list on its
description clearly indicates.

>  * `place-with=<service-name>`: On a separate container in *any*

Same thing. Mentioning this is worthwhile for the list and for our
history, but I'd keep this in a future ideas section without detailed
semantics. We haven't agreed on proper semantics after much
discussion, and we don't have to agree on them right now since we
won't be implementing it just yet and it won't affect the outcome of
the rest of the feature, I believe.

> Dependencies
> ------------
>
> * We need orchestra to expose `cpu`, `ram` and `storage`; and ideally,
> in case we end up with megamachine orchestra deployments, an API which

mega-machine orchestra deployments can easily be based on classes for
the moment. I'd drop the reference to a special API being a dependency
at this point.

> * We need to extend `juju set` to allow for (1) environment changes,
> which could be a moderately large change, and (2) service changes that

It's not clear to me that 'juju set' is the proper place for that. The
command has a completely different shape and outcome, and deserves its
own options and help text.

> aren't actually part of the service config, and don't need to be
> communicated to units. This could be ignored, and most of the spec could
> still be implemented, but it would be rather inconvenient for users to
> have immutable service-level or environment-level constraints.

Indeed.

> * We need the ability to deploy units to separate LXC containers within
> individual machines. If that isn't done, the override constraints
> *could* still be developed, but only on the understanding that the
> services could potentially interfere with one another, and that the
> risks of this being a serious problem are low enough to justify the
> feature's inclusion.

I'd keep these out entirely for now. There's a lot to do without them,
and their existence won't affect the shape of the other options, IMO.

> * An additional "generic" `gpu` constraint, defaulting to `0`, allowing
> us to generically specify a cg1.4xlarge, and giving us the possibility
> of extending orchestra to expose this as well. Not sure how we'd measure
> GPU power.

Also feels like a nebulous area. It isn't just about having a GPU, but
which GPU it is, etc.

> * Additional provider constraints, including (surely non-exhaustive;
> please contribute ideas):
>
>    * `ec2-image-id`: image ID. Will need to be used with care; could

This would be a significant mistake, IMO. Encouraging usage of custom
AMIs for charms will degenerate the charm's content and undermine the
overall design in ways we didn't really think through yet.

> * Max constraints: allow generic constraints to also take the value
> `max`, meaning "the best available". (If you specify `cpu=max` and
> `storage=max`, the constraints cannot be satisfied unless the available
> machine with the (equal) greatest amount of storage also has the (equal)
> most processing power.)

Feels dubious. Can't imagine good scenarios where an admin would care
to use the maximum available without knowing what it is, and even
harder to imagine he'd be wiling to do nothing if the machine that has
1.5GB has less CPU than the one with 1GB.

As a general guideline, we should try to keep our focus on relevant
use cases at this stage.

>  * `scale-with=<service-name>`: On a separate container in *every*
> machine running a unit of `service-name`; henceforth every add or remove

This is also worth debating further, but feels like a good idea in
principle. Worth keeping in Future Ideas, IMO.

> of a unit of `service-name` will lead to the addition or removal of the
> corresponding unit of the requested service. If `service-name` is
> destroyed, the requested service will not be, but the only running units
> will be any that were deployed separately from the scale-with request.

That said, let's not detail its semantics either to avoid having to
debate about them right now. I'm not sure it makes sense to destroy
service B's units on an explicit removal of service A in its entirety,
for instance. Sorting out these details can be done in a future
conversation.

> * Roles: named groups of constraints (better name than "roles"?). Useful
> for OAOO-ness and reduced typing when scripting or running from the
> command line; also useful for recording intended machine characteristics
> when not otherwise translatable (for example, when serialising a
> deployment as a stack, it would be thoughtful to include a
> `fast-network` role to hold the `orchestra-classes=rack-c` constraint

Even though I suggested that in the sprint, I'd keep that out of the
spec entirely. There are important shortcomings in that feature, like
the fact roles would be a flat unorganized namespace, that multiple
charms could potentially conflict without being aware of, and with
strange interactions in the case of multi-layered stacks. It's also
not clear if there's much benefit in comparison to the role concept we
already have in place through the existence of services as a model.
This feels like a big gray area to me, that could easily feel like a
good idea, and be a big mistake.


Thanks a lot for the comprehensive spec WIlliam. This is very helpful.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog

-- I'm not absolutely sure of anything.



More information about the Juju mailing list