Scale test in Capetown

Wed Feb 26 06:56:11 UTC 2014

On Tue, Feb 25, 2014 at 4:52 PM, Kapil Thangavelu
<kapil.thangavelu at canonical.com> wrote:
>
>
>
>
> On Tue, Feb 25, 2014 at 12:41 AM, John Meinel <john at arbash-meinel.com> wrote:
>>
>> Hey guys, I did some scale testing back in Capetown,  and I realized I didn't send any results to be shared. Here's the brain dump of what I remember:
>
>
> Hi John,
>
> Thanks for sending this out.
>
>>
>> 1) juju deploy -n 15 fails to spawn a bunch of the machines with "Rate Limit Exceeded". I filed  bug for it. The instance poller showed up the most in the log files. A good improvement there is to change our polling to do all machines in batch (probably with a fast batch and a slow batch?) A bigger fix is that if provisioning fails because of rate limiting we should just try again later.
>
> Ideally we could move to mass provisioning machines, every provider (minus azure) allow us to create n machines with a single api call. We currently do userdata specification to a single machine. But we can move to doing a generic userdata that calls back to a state server for machine specific registration. The security group model we have of per machine group usage is also broken at scale due to limits on groups and number of rules within group and exacerbates rate limits due to setup pre instance creation. Per ec2 docs sec groups happen to be lower rate limit restrictions on api calls. Much of that pre-instance setup and per machine usage stems historically from aws non-vpc constraints where groups could not be dynamically assigned to instances. That is no longer the case with ec2 if we move to vpc and was never the case with openstack. Moving to a group per service that's dynamically assigned would be ideal.

Spawning in batch would be interesting, but I think we can fix some of
our API request rate without it in the immediate term.

I agree that all we really *need* is to get a machine agent up and
running on each machine talking to the API server to find out who it
is and what it should be doing. I'm not sure of the actual effort to
get there.

For security groups, a group per service would work well if we could
pull it off. AIUI we can deploy machines where we don't know what
service will be running (juju add-machine) and you can put multiple
services *on* a machine, and at least basic EC2 doesn't let you change
security groups of a running machine.
However, if we can tease some of that apart, if we just make the
*default* behavior be for us to be deploying a known service on a
known machine (or the ability to add sec groups while running, even if
that means "rebooting" machines), that seems like we'd be in a really
good default position.

>
> Agreed re transient error auto retry (basically this bug https://bugs.launchpad.net/juju-core/+bug/1227450)
>
>>
>> 2) After 1-2k units restarting agent machine-0 could cause it to deadlock. From what I could tell we were filling the request socket to mongo and it just wasn't returning responses. Even to the point of getting a 5 minute timeout trying to write to the socket.  I added a "max 10 concurrent Login requests" semaphore and I could see it activating and telling agents to come back later, and never deadlocked it again.  Need more testing here. I think it is worth adding. (10/100/whatever,  the key seems to be avoiding having 10,000 units all trigger the same request at the same time.)
>
> Even on small environments, i've been hearing reports this last week (against 1.17.3)  of issues with mongodb and state server mongodb connections that die triggering mass reconnects and transient failures for agents. Roger thinks this might be an issue in mgo.
>
> Feb 24 13:48:41 juju-ci-oxide-machine-0 mongod.37017[5150]: Mon Feb 24 13:48:41.067 [conn2] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:49239]
>
> https://bugs.launchpad.net/juju-core/+bug/1284183
>
> Getting some finer grained instrumentation on mongo would be nice. It might be useful to setup mms.mongodb.com metrics reporting on mongodb during a load test (albeit with the caveat that it could cause some pertubation).
>

That's an interesting thought. The only immediate caveat is that mms
is the "cloud only" version,  Which makes me think it might need a
custom version of Mongodb? I haven't ever used it, but I'll see what I
can do the next time I'm doing a scale test.

>>
>> 3) Things get weird if you try to deploy more unit agents than you have RAM for. I could only get about 800-900 agents in 7.5GB of RAM (and they don't have swap). I doubt this is ever an issue in practise but was odd to figure out.
>
> What sort of memory usage where you seeing on the state server per connection? ie. any idea what causes this constraint? is all the ram being consumed?

The actual memory consumption-per-connection actually wasn't too bad
on machine-0. I'm sorry I wasn't clear here, this is the memory
consumption on each of the machine-1->15 that I had running, where
each was running 800+ Unit agents. Not something you'd do in practice.

Yeah, 100% of the 7.5GB of RAM was in use, so when the machine agent
was trying to spin up another unit agent, one of the other agents
would be killed, and then upstart would restart it, etc.

>
>>
>> 4) we still need to land the GOMAXPROCS code for the API servers.
>>
>> 5) Even with that,  I don't think I saw more than 300% load on the API server and never more than 80% for mongo. I think mongo is often a bottleneck,  but I don't know how to make it use more than 1 CPU. Maybe because we use only one connection?
>
> at the mongodb level, writes only use a single thread, reads can use multiple multi-threaded, but yeah that's a consequence of concurrent connections.
>
>>
>> 6) We still have the CharmURL bug, but we knew that.
>
> A reference to all units watching the service doc for charm_url  when the doc increments each time unit count is changed.  We had discussed partial watches a briefly at capetown, not sure if that ever got to a viable strategy outside of discussion of  potentially moving the watch impl out of the db to the api layer and meshing across state servers. A simpler reasonable alternative might be moving charm_url to the service settings docs.

So there are 2 bits here:
1) Move the reference count out of the Service (and I think there is 1
other field that changes on each unit added). We couldn't move the
reference counts when we had direct DB access, but we should be able
to (if not in 1.18, in 1.20).
2) Changing the Watcher implementation to support an in-memory model
that notices changes to individual fields and sychronizes those
changes outside of mongodb. We'll want that for other reasons, but it
is a much bigger project.

>>
>> 7) I think getting some charms that have basic relations and having them trigger at scale could be really useful.
>
> Definitely.
>
> cheers,
>
> Kapil
>

I just wish I had more cycles to just play with this. :)

John
=:->