Juju Scale Test

Sat Oct 4 06:45:55 UTC 2014

Wasn't the container testing on Power done using Juju? I thought they hit
something like 60k virtual machines.

I have scale tested Juju and the number of agents it can handle to
effectively 5000 machines, though this was just agents interacting with the
API server, and not a lot of actual workload. I did do stuff like
"reconfigure all 5000 units at the same time" and things performed just
fine. I was also observing the behavior of "if you kill the API server, how
long before the system recovers", which was about 3 minutes at 2k
"machines".

I've also done some testing for "if change the charm so that it tries to
change its relation data as fast as it possibly can, how many can you run
at the same time", and there I think I got to about 500 or so before the
database bottlenecked.

I started a bit of scale testing in an HA scenario, and initial results
showed the performance was going to improve (not linearly, but better than
a single machine), but at the time I was testing it we ran into some other
bugs that have since been fixed.

It certainly is the sort of thing that we should be doing on a regular
basis (every 3 months?) I haven't done it since back in April. I have
started putting together some scripts to make it a little bit easier to
spin them up. (It currently takes about 20 minutes to spin up 15 real
machines, and add 1000 units onto those machines, mostly because we don't
allow you to specify "-n" with "--to".)

The latest bottleneck that I'm aware of is that we changed our logging
infrastructure so every agent tries to connect directly to rsyslogd to send
its log messages, and rsyslog doesn't really like 10,000 TLS connections.
(By default rsyslog restricts to 200 concurrent connections, bumping that
up to 1000, and I was maxing 1 CPU for rsyslogd, and I was still getting
some connections rejected.)

John
=:->

On Thu, Oct 2, 2014 at 6:13 PM, Mark Ramm-Christensen (Canonical.com) <
mark.ramm-christensen at canonical.com> wrote:

> However we did do similar testing on Go juju a while back.
>
> That said we need to publish scale testing numbers again.   Something to
> talk about next week!
>
> On Thu, Oct 2, 2014 at 10:03 AM, Kapil Thangavelu <
> kapil.thangavelu at canonical.com> wrote:
>
>> Unfortunately that's not very representative of the current
>> implementation as it was based on pyjuju while the current implementation
>> is in go and utilizing mongodb instead of zookeeper.
>>
>> -kapil
>>
>> On Thu, Oct 2, 2014 at 9:40 AM, Charles Butler <
>> charles.butler at canonical.com> wrote:
>>
>>> There's this article which was published a while ago:
>>>
>>>
>>> https://maas.ubuntu.com/2012/06/04/scaling-a-2000-node-hadoop-cluster-on-ec2ubuntu-with-juju/
>>>
>>> Hope this helps,
>>>
>>> Charles
>>>
>>> On Thu, Oct 2, 2014 at 9:02 AM, Mike Sam <mikesam460 at gmail.com> wrote:
>>>
>>>> I was wondering what is the largest vm count that has been provisioned
>>>> and deployed with juju in testing so far? In other words, what is the
>>>> demonstrated scale that juju has proven to handle well so far?
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>> --
>>>> Juju-dev mailing list
>>>> Juju-dev at lists.ubuntu.com
>>>> Modify settings or unsubscribe at:
>>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>>
>>>>
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev at lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
> --
> Juju-dev mailing list
> Juju-dev at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20141004/f69fd9ec/attachment.html>