preliminary machine placement discussion

Wed Nov 9 10:11:18 UTC 2011

On Tue, 2011-11-08 at 12:00 -0500, Kapil Thangavelu wrote:
> What CPU means is a pretty rich topic onto itself. CPUs differ widely
> in architecture (sparc, arm, x86) and even by capability within the same
> product family name, ie a penryn xeon is a vastly different beast both in terms 
> ofperformance and power characterstics then a sandy-bridge xeon. I think
> portability in this aspect will require some sort of normalization against a
> known quantity, in that regard the ec2 compute unit is probably a reasonable
> standard. As far calculating such a value, i'd see it possibly as a binary that 
> does some computation across a few runs and spits out a number along with 
> gathering cpu info.

Sounds very sensible to me.

> Storage is also a rich topic, a single 1tb disk is a different quantity than
> two 300gb disks and a 256gb ssd. We could probably simplify as some
> available fs size. Ignoring that for a moment, the meaning of storage in a cloud 
> environment is also different in that it touches upon volume management to 
> fufill allocation. ie. storage is not nesc. an inherent quality but an allocated 
> one. For now though simple gb count sounds reasonable.

I *think* the fact that it's allocated is irrelevant -- where it can be,
the provider will have to know to interpret a storage request correctly,
and choose the machine purely on the basis of units+memory. Where it
can't, it's just another constraint. I can imagine situations where that
falls down, but I think it's good enough for now.

> For deploy constraints, i think they apply generally to the service as 
> deployment defaults. Unit level constraints automatically flow from the service 
> level constraints, but can be overridden on a per service basis. Possibly, the 
> service level deploy settings an exposed mechanism for manipulation of the 
> default, unclear though since it has no activation against the existing units.

Assuming you mean "overridden on a per unit basis", I agree with the
first bit. I feel we probably *should* allow manipulation of service
defaults, and that if we did we could easily eliminate ambiguity by
calling it "new-instance-constraints" or something; but it's not
actually necessary for a minimally useful implementation.

> class is a rather generic term, arrising i think from an orchestra impl detail, 
> but the usage here would promote multiple inheritance.. oh wrong paradigm ;-)

Actually it wasn't -- it seemed to me to be the correct term, and the
orchestra mgmt-class was just a neat correspondence after the fact :).

> i think of these as provider specific constraint vocabularies... ie location: 
> foo, machine-type: m1.large, etc. we can extend to orchestra with some syntax 
> for auto created management classes from inventory.

That said, though, I think you're on the right track: where we can make
juju more aware of the groupings and their meanings, we should, and we
can have orchestra-specific vocabulary for mgmt-classes as part of that.

> interesting. its important to note that capturing deployment resource 
> constraints isn't nesc. even meanigful in the same environment, based on 
> intended usage (ie deploy time considerations) of the service, even ignoring 
> resource availability. ie. we need to make it easy to redeploy a given service 
> set, but also to modify that deployment wrt to constraints. 

Indeed: I'm in speculative mode here, and really thinking about how they
could/should apply to stacks in the end.

> I still think their is value in doing the reverse provider mapping to generic 
> constraint where possible along with keeping the provider specific vocabulary 
> attached to it ( at least by default for cloud providers).

I could probably be convinced that an imperfect-but-often-helpful
default is better than nothing, and so long as people can override it I
guess it's not a huge problem. But it still feels a bit wrong :).

I'm even more suspicious of keeping provider-specific vocab around in
generic stacks, but I guess we could prefix the keys with the provider
type (ec2-machine-type, for example) and ignore any that don't apply.
Actually, that feels like a very nice idea, I'll think about it some
more.

> 
> > So... what can we do? While I hate to introduce another new concept, I
> > think it's justified here: we want to be able to group constraints as
> > "roles". This has two notable advantages:
> > 
> > * We can simplify command lines -- considering all the other possible
> > options we already handle, it'll be quite convenient to be able to do
> > things like:
> > 
> >   juju set-role compute --constraint cores=128,ram=64G
> >   juju deploy nova-compute --role compute
> >   juju set-role compute-spike --constraint cores=32
> >   juju add-unit nova-compute --role compute-spike
> > 
> > ...or even:
> > 
> >   juju set-role compute --constraint cores=128,ram=64G
> >   juju set-role compute-spike --constraint cores=32
> >   juju deploy nova-compute --role compute
> >   ...
> >   juju set nova-compute --role compute-spike
> >   juju add-unit nova-compute
> >   juju add-unit nova-compute
> >   ...
> >   juju add-unit nova-compute
> > 
> > * More importantly, it gives us a mechanism for capturing the *intent*
> > of a set of constraint: so, even if we can't turn "rack-c7" into a
> > provider-independent constraint, we *can* encode the fact that we'd
> > prefer to deploy haproxy to a well-connected machine by specifying (say)
> > a "fat-pipe-proxy" role.
> 
> hmm.. a purely semantic intent against an unstructured vocabuarly?

Yes, tailor-made for interpretation by a massively-parallel pattern
matcher and inference engine, likely to be capable of translating that
semantic intent into relevant datacentre-specific constraints.

> I don't really see the value of doing additional role management. It seems like 
> your just defining service level constraints again with a different management 
> syntax. The value over just using the constraints seems a bit dubious. The 
> posited reasons for the additional management layer are only around an 
> abstraction for import/export. Yet the import usage itself means managing the 
> values for this additional resource mapping contextualized to the usage 
> environment prior to the stack usage, which require pre-definition mapping of 
> roles for import or just accepting/defining the roles as per whats captured 
> in the stack.
> 
> Really any of the deploy time resource constraints around an exported stack are 
> subject to manipulation when choosing to use a stack. 

Indeed so -- I see it as a valuable way of exposing a stack's
requirements at the human level so that, when deploying a new stack in
your own environment, you don't necessarily need to think about every
service. It's potentially somewhat niche -- it assumes that relatively
complex stacks are likely to have multiple services with similar
requirements, and that it will be useful to say things like "all these
services should just run on m1.smalls".

> The concept of roles don't seem like they address the provider specific 
> vocabulary question, the additional semantic capture in a unstructured 
> vocabulary label requires human inspection and interpretation as well as 
> management of the value. I'd like to just be able to deploy a 3rd party stack 
> directly.

Definitely: the intent is to allow you to *tweak* a stack's deployment
constraints at a convenient level. I agree that a stack should be
deployable in a sensible configuration *without* needing this feature,
but I also think named sets of constraints will be useful (if not
actually *necessary*) both for stacks and for ad-hoc use in normal
environments.

> 
> There is potentially some value here in the abstract constraint definition for 
> various smart stack scaling logic, but i don't see that as something that's 
> really needed for an initial constraint/placcement or even stack implementation.

Yeah: this bit of the discussion is not really relevant to the initial
feature, I just wanted to get it down somewhere.

> Thanks again for kicking off the discussion.

A pleasure :).

Cheers
William