Notes from Scale testing

John Arbash Meinel john at arbash-meinel.com
Wed Oct 30 13:23:43 UTC 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm trying to put together a quick summary of what I've found out so
far with testing juju in an environment with thousands (5000+) agents.


1) I didn't ever run into problems with connection failures due to
socket exhaustion. The default upstart script we write for jujud has
"limit nofile 20000 20000" and we seem to properly handle that 1 agent
== 1 connection. (vs the old 1 agent = >=2 mongodb connections).


2) Agents seem to consume about 17MB resident according to 'top'. That
should mean we can run ~450 agents on an m1.large. Though in my
testing I was running ~450 and still had free memory, so I'm guessing
there might be some copy-on-write pages (17MB is very close to the
size of the jujud binary).


3) On the API server, with 5k active connections resident memory was
2.2G for jujud (about 400kB/conn), and only about 55MB for mongodb. DB
size on disk was about 650MB.

The log file could grow pretty big (up to 2.5GB once everything was up
and running though it does compress to 200MB), but I'll come back to
that later.

Once all the agents are up and running, they actually are very quiet
(almost 0 log statements).


4) If I bring up the units one by one (for i in `seq 500`; do for j in
`seq 10` do juju add-unit --to $j &; time wait; done), it ends up
triggering O(N^2) behavior in the system. Each unit agent seems to
have a watcher for other units of the same service. So when you add 1
unit, it wakes up all existing units to let them know about it. In
theory this is on a 5s rate limit (only 1 wakeup per 5 seconds). In
practice it was taking >3s per add unit call [even when requesting
them in parallel]. I think this was because of the load on the API
server of all the other units waking up and asking for details at the
same time.

- From what I can tell, all units take out a watch on their service so
that they can monitor its Life and CharmURL. However, adding a unit to
a service triggers a change on that service, even though Life and
CharmURL haven't changed. If we split out Watching the
units-on-a-service from the lifetime and URL of a service, we could
avoid the thundering N^2 herd problem while starting up a bunch of
units. Though UpgradeCharm is still going to thundering herd.

Response in log from last "AddServiceUnits" call:
http://paste.ubuntu.com/6329753/

Essentially it triggers 700 calls to Service.Life and CharmURL (I
think at this point one of the 10 machines wasn't responding, so it
was <1k Units running)


5) Along with load, we weren't caching the IP address of the API
machine, which caused us to read the provider-state file from object
storage and then ask EC2 for the IP address of that machine.
Log of 1 unit agent's connection: http://paste.ubuntu.com/6329661/

Eventually while starting up the Unit agent would make a request for
APIAddresses (I believe it puts that information into the context for
hooks that it runs). Occasionally that request gets rate limited by EC2.
When that request fails it triggers us to stop the
  "WatchServiceRelations"
  "WatchConfigSettings"
  "Watch(unit-ubuntu-4073)" # itself
  "Watch(service-ubuntu)"   # the service it is running

It then seems to restart the Unit agent, which goes through the steps
of making all the same requests again. (Get the Life of my Unit, get
the Life of my service, get the UUID of this environment, etc., there
are 41 requests before it gets to APIAddress)


6) If you restart jujud (say after an upgrade) it causes all unit
agents to restart the 41 requests for startup. This seems to be rate
limited by the jujud process (up to 600% CPU) and a little bit Mongo
(almost 100% CPU).

It seems to take a while but with enough horsepower and GOMAXPROCS
enabled it does seem to recover (IIRC it took about 20minutes).


7) If I "juju deploy nrpe-external-master; juju add-relation ubuntu
nrpe-external-master", very shortly thereafter "juju status" reports
all agents (machine and unit agents) as "agent-state: down". Even the
machine-0 agent. Given I was already close to capacity for even the
unit machines there could be any sort of problem here. I would like to
try another test where we are a bit farther away from capacity.


8) We do end up CPU throttled fairly often (especially if we don't set
GOMAXPROCS). It is probably worth spending some time profiling what
jujud is doing. I have the feeling all of those calls to CharmURL are
triggering DB reads from Mongo, which is a bit inefficient.

I would be fine doing max(1, NumCPUs()-1) or something similar. I'd
rather do it inside jujud rather than in the cloud-init script,
because computing NumCPUs is easier there. But we should have *a* way
to scale up the central node that isn't just scaling out to more API
servers.

9) We also do seem to hit MongoDB limits. I ended up at 100% CPU for
mongod, and I certainly was never above 100%. I didn't see any way to
configure mongo to use more CPU. I wonder if it is limited to 1 CPU
per connection, or if it is just always 1 CPU.

I certainly think we need a way to scale Mongo as well. If it is just
1 CPU per connection then scaling horizontally with API servers should
get us around that limit.

10) Allowing "juju add-unit -n 100 --to X" did make things a lot
easier to bring up. Though it still takes a while for the request to
finish. It felt like the api call triggered work to start happening in
the background which made the current api call take longer to finally
complete. (as in, minutes once we had >1000 units).

I generally went
  juju deploy ubuntu -n 10
  # grow to 100
  for i in `seq 10`; do juju add-unit -n 9 --to $i & done; time wait
  # grow to 1000
  for i in `seq 10`; do juju add-unit -n 90 --to ...
  # grow to 5000
  for i in `seq 10`; do juju add-unit -n 400 --to ...

The branch with my patches is available at:
  lp:~jameinel/juju-core/scale-testing

Not everything in there is worth landing in trunk (rudimentary API
caching, etc).

That's all I can think of for now, though I think there is more to be
explored.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJxCF8ACgkQJdeBCYSNAAOL1gCeNWP1G7a6UaJ1iNxT8HB7RpQo
IiUAniGX4CGLwXFUBFNwbFojubvpXUER
=4dAx
-----END PGP SIGNATURE-----



More information about the Juju-dev mailing list