overlapping constraints (ec2-instance-type with arch/cpu/mem)

Fri Dec 16 16:13:42 UTC 2011

All the constraints we've specced are currently independent... *except*
for ec2-instance-type, which implies settings for arch, cpu, and mem.
[0]

-----

The spec said that "the tightest constraints win", but that's wrong
because:

env: [mem=16G]
service: [ec2-instance-type=m1.small]

...would get you an m2.xlarge (which definitely doesn't match intent).

-----

It has also been suggested that the most specific constraint should win,
but IMO that's wrong too because:

env: [ec2-instance-type=m1.small]
service: [mem=16G]

...would get you an m1.small (which again differs noticeably from what
the user wanted).

-----

I feel, at the moment, that the least worst solution is to implicitly
convert ec2-instance-type constraints into cpu/mem constraints at each
level, so (for the purposes of constraint overriding, at least):

[ec2-instance-type=m1.large] == [cpu=4 mem=7.5G]

...and the fact that m1.large implies arch=x64 should be ignored.

Thus:

env: [ec2-instance-type=m1.small]
  == [cpu=1 mem=1.7G]
service: [mem=16G]

...evaluates to [cpu=1 mem=16G], which turns out to be an m2.xlarge;
while:

env: [mem=16G]
service: [ec2-instance-type=m1.small]
      == [cpu=1 mem=1.7G]

...evaluates to [cpu=1 mem=1.7G], which is exactly what we want.

-----

Note that if arch wasn't ignored:

env: [ec2-instance-type=m1.small]
  == [arch=386 cpu=1 mem=1.7G]
service: [mem=16G]

...evaluates to [arch=386 cpu=1 mem=16G], which is impossible to
satisfy.

-----

Please also note that t1.micro is rather hard to describe, because of
the bursty cpus; I'd be inclined to say something like [mem=613
cpu=0.01], but I'd be happy to be corrected.

-----

Finally, I think that these are all actually edge cases: when a user
specifies [ec2-instance-type=m1.large], he's *thinking* in terms of
ec2-instance-type anyway, and he's *much* more likely to override with
[ec2-instance-type=m1.xlarge] than he is with [cpu=8].

Still, the consequences of accidentally firing up 50 cc2.8xlarges
instead of 50 t1.micros are more than somewhat serious, especially if
you leave them running for a couple of days (or weeks...), and I think
we should take the risk into account when choosing a solution.

Thoughts?
William

[0] orchestra-name and orchestra-classes overlap to a certain extent,
but the worst outcome from screwing those up is "no available machines",
which is easy to fix, while the worst outcome from an ec2-instance-type
screwup is a delayed "OMGIHAVENOMONEY", which I'd prefer not to
contribute to inflicting on anyone.