[Maas-devel] Scaling to 72k nodes

Tue Oct 16 14:24:16 UTC 2012

Thanks for the write up John!

On 16/10/2012, John Arbash Meinel <john at arbash-meinel.com> wrote:
>
> Earlier in the week I had 10 Cluster Controllers, each on a c1.medium,
> which is 2-cores of the same speed as the c1.xlarge. Today I added an
> additional 8 Cluster Controllers (because I'm currently limited to 20
> EC2 instances, and I need to figure out how to change that).

You might want to poke James Page on IRC as he needed his limit bumped
up in the past, unfortunately I think it's not very easy to navigate
the Amazone bureaucracy.

> One interesting bit is the 'fairness' of the system. It appears that
> all the cluster controllers get the job request at approximately the
> same time, but I end up getting most of them completing the job in
> about 3.5s, but then a second wave of them take 13s to complete. My
> guess is that it has something to do with keep-alive. The Region
> controller currently has 12 'wgsi' workers, (though I really only see
> 8 of them with active CPU at any one time).

Was it clear at which stage the delay was on for the second batch?
Waiting for the hardware details, or when posting back the matching
nodes? Fiddling with the number of workers does sound like it may
affect things.

Martin