Scale test in Capetown

Sat Mar 1 14:07:17 UTC 2014

On Wed, Feb 26, 2014 at 1:56 AM, John Meinel <john at arbash-meinel.com> wrote:

> On Tue, Feb 25, 2014 at 4:52 PM, Kapil Thangavelu
> <kapil.thangavelu at canonical.com> wrote:
> >
> >
> >
> >
> > On Tue, Feb 25, 2014 at 12:41 AM, John Meinel <john at arbash-meinel.com>
> wrote:
> >>
> >> Hey guys, I did some scale testing back in Capetown,  and I realized I
> didn't send any results to be shared. Here's the brain dump of what I
> remember:
> >
> >
> > Hi John,
> >
> > Thanks for sending this out.
> >
> >>
> >> 1) juju deploy -n 15 fails to spawn a bunch of the machines with "Rate
> Limit Exceeded". I filed  bug for it. The instance poller showed up the
> most in the log files. A good improvement there is to change our polling to
> do all machines in batch (probably with a fast batch and a slow batch?) A
> bigger fix is that if provisioning fails because of rate limiting we should
> just try again later.
> >
> > Ideally we could move to mass provisioning machines, every provider
> (minus azure) allow us to create n machines with a single api call. We
> currently do userdata specification to a single machine. But we can move to
> doing a generic userdata that calls back to a state server for machine
> specific registration. The security group model we have of per machine
> group usage is also broken at scale due to limits on groups and number of
> rules within group and exacerbates rate limits due to setup pre instance
> creation. Per ec2 docs sec groups happen to be lower rate limit
> restrictions on api calls. Much of that pre-instance setup and per machine
> usage stems historically from aws non-vpc constraints where groups could
> not be dynamically assigned to instances. That is no longer the case with
> ec2 if we move to vpc and was never the case with openstack. Moving to a
> group per service that's dynamically assigned would be ideal.
>
> Spawning in batch would be interesting, but I think we can fix some of
> our API request rate without it in the immediate term.
>

Agreed, part of what i'm describing is an ideal goal of provider
interaction that scales and minimizes round trips (bulk launch, bulk
describe, sec group reuse). For the immediate fixing the instance updater
in cloud apis that support bulk describe would help.

>
> I agree that all we really *need* is to get a machine agent up and
> running on each machine talking to the API server to find out who it
> is and what it should be doing. I'm not sure of the actual effort to
> get there.
>

I've had to do it in a partner integration (bulk external provisioning,
manual provider) i'm happy to share the code if its of interest. Basically
its just an http api endpoint on the state server, and generic user data
that calls back to the state server for its actual machine agent
provisioning script (ie user data doesn't even need the machine agent up to
do this bulk usage just a generic callback).

>
> For security groups, a group per service would work well if we could
> pull it off. AIUI we can deploy machines where we don't know what
> service will be running (juju add-machine) and you can put multiple
> services *on* a machine, and at least basic EC2 doesn't let you change
> security groups of a running machine.
>

As per my original, ec2 does let you modify sec groups in vpc or default
vpc.

> However, if we can tease some of that apart, if we just make the
> *default* behavior be for us to be deploying a known service on a
> known machine (or the ability to add sec groups while running, even if
> that means "rebooting" machines), that seems like we'd be in a really
> good default position.
>

We can't assume static mapping of service to machine at creation time, the
right way is to use cloud apis correctly to dynamically assign the groups
at runtime. There's no rebooting needed. Alternatively a fallback (and
needed for maas  anyways) is to do iptables on the instance.

> >
> > Agreed re transient error auto retry (basically this bug
> https://bugs.launchpad.net/juju-core/+bug/1227450)
> >
> >>
> >> 2) After 1-2k units restarting agent machine-0 could cause it to
> deadlock. From what I could tell we were filling the request socket to
> mongo and it just wasn't returning responses. Even to the point of getting
> a 5 minute timeout trying to write to the socket.  I added a "max 10
> concurrent Login requests" semaphore and I could see it activating and
> telling agents to come back later, and never deadlocked it again.  Need
> more testing here. I think it is worth adding. (10/100/whatever,  the key
> seems to be avoiding having 10,000 units all trigger the same request at
> the same time.)
> >
> > Even on small environments, i've been hearing reports this last week
> (against 1.17.3)  of issues with mongodb and state server mongodb
> connections that die triggering mass reconnects and transient failures for
> agents. Roger thinks this might be an issue in mgo.
> >
> > Feb 24 13:48:41 juju-ci-oxide-machine-0 mongod.37017[5150]: Mon Feb 24
> 13:48:41.067 [conn2] SocketException handling request, closing client
> connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:49239]
> >
> > https://bugs.launchpad.net/juju-core/+bug/1284183
> >
> > Getting some finer grained instrumentation on mongo would be nice. It
> might be useful to setup mms.mongodb.com metrics reporting on mongodb
> during a load test (albeit with the caveat that it could cause some
> pertubation).
> >
>
> That's an interesting thought. The only immediate caveat is that mms
> is the "cloud only" version,  Which makes me think it might need a
> custom version of Mongodb? I haven't ever used it, but I'll see what I
> can do the next time I'm doing a scale test.
>
>
mms is against standard versions of mongo, mongo has a good number of
metrics available, mms just provides easy setup and visualization.

>
> >>
> >> 3) Things get weird if you try to deploy more unit agents than you have
> RAM for. I could only get about 800-900 agents in 7.5GB of RAM (and they
> don't have swap). I doubt this is ever an issue in practise but was odd to
> figure out.
> >
> > What sort of memory usage where you seeing on the state server per
> connection? ie. any idea what causes this constraint? is all the ram being
> consumed?
>
> The actual memory consumption-per-connection actually wasn't too bad
> on machine-0. I'm sorry I wasn't clear here, this is the memory
> consumption on each of the machine-1->15 that I had running, where
> each was running 800+ Unit agents. Not something you'd do in practice.
>

ic. 800 * 15  ~ 12k.. but the numbers quoted are against 1-2k, so most of
the capacity isn't being used?

> Yeah, 100% of the 7.5GB of RAM was in use, so when the machine agent
> was trying to spin up another unit agent, one of the other agents
> would be killed, and then upstart would restart it, etc.
>
>
oom fun :-)

>
> >
> >>
> >> 4) we still need to land the GOMAXPROCS code for the API servers.
> >>
> >> 5) Even with that,  I don't think I saw more than 300% load on the API
> server and never more than 80% for mongo. I think mongo is often a
> bottleneck,  but I don't know how to make it use more than 1 CPU. Maybe
> because we use only one connection?
> >
> > at the mongodb level, writes only use a single thread, reads can use
> multiple multi-threaded, but yeah that's a consequence of concurrent
> connections.
> >
> >>
> >> 6) We still have the CharmURL bug, but we knew that.
> >
> > A reference to all units watching the service doc for charm_url  when
> the doc increments each time unit count is changed.  We had discussed
> partial watches a briefly at capetown, not sure if that ever got to a
> viable strategy outside of discussion of  potentially moving the watch impl
> out of the db to the api layer and meshing across state servers. A simpler
> reasonable alternative might be moving charm_url to the service settings
> docs.
>
> So there are 2 bits here:
> 1) Move the reference count out of the Service (and I think there is 1
> other field that changes on each unit added). We couldn't move the
> reference counts when we had direct DB access, but we should be able
> to (if not in 1.18, in 1.20).
>

That sounds reasonable.. but curious what's wrong with my suggestion of
charm_url to settings where units are already watching.

> 2) Changing the Watcher implementation to support an in-memory model
> that notices changes to individual fields and sychronizes those
> changes outside of mongodb. We'll want that for other reasons, but it
> is a much bigger project.
>
>
> >>
> >> 7) I think getting some charms that have basic relations and having
> them trigger at scale could be really useful.
> >
> > Definitely.
> >
> > cheers,
> >
> > Kapil
> >
>
> I just wish I had more cycles to just play with this. :)
>

me too. i think there might be some space to explore using large instances
on ec2 (240 network addresses with eni and vpc) and creating containers per
unit to actually run some basic workloads.

-k
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140301/2b622eda/attachment.html>