Notes from Scale testing

Wed Oct 30 15:08:22 UTC 2013

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-10-30 18:11, Nate Finch wrote:
> 
> On Wed, Oct 30, 2013 at 9:23 AM, John Arbash Meinel 
> <john at arbash-meinel.com <mailto:john at arbash-meinel.com>> wrote:
> 
> 2) Agents seem to consume about 17MB resident according to 'top'.
> That should mean we can run ~450 agents on an m1.large. Though in
> my testing I was running ~450 and still had free memory, so I'm
> guessing there might be some copy-on-write pages (17MB is very
> close to the size of the jujud binary).
> 
> 
> 17MB seems just fine for an agent. I don't think it's worth
> worrying much about that size, since it's fairly static and you
> generally aren't going to run 450 copies on the same machine :)

Yeah, I'm not worried about unit-agent memory size. Though it does
give some rough estimates of what we can get away with when doing this
sort of testing. (We need to have enough memory/bandwidth/cpu to be
able to not cause artificial bottlenecks when doing testing.)
AFAICT disk space and memory end up the primary bottlenecks with this
testing. Disk space I hopefully have an answer for (don't have all
units download the same object simultaneously and unpack it concurrently).

> 
> 
> 3) On the API server, with 5k active connections resident memory
> was 2.2G for jujud (about 400kB/conn), and only about 55MB for
> mongodb. DB size on disk was about 650MB.
> 
> 
> 400kB per connection seems atrocious.  Goroutines take about 4k on
> their own.  I have a feeling we're keeping copies of a lot of stuff
> in memory per connection that doesn't really need to be copied for
> each connection.  It would be good to get some profiling on that,
> to see if we can get it down to something like 1/10th that size,
> which would be more along the lines of what I'd expect per
> connection.

Each Unit agent triggers ~5 Watch objects in the API server. And the
API server doesn't do any pooling. We could probably shave a lot of
this off if we did something like 1-Watcher per object being watched,
or something like that. The only caveat is that the Watch object on
the server side is the one that tracks the current pointers into
actions that have happened that haven't been reported yet.

Each Watch object is going to be at least 1 goroutine. So you're at
20k right there, without any actual data bookkeeping.

Again, profiling is probably the best thing at this point (dump a
memory profile when 5000 units have reached steady state).

> 
> 
> The log file could grow pretty big (up to 2.5GB once everything was
> up and running though it does compress to 200MB), but I'll come
> back to that later.
> 
> 
> interesting question - are our log calls asynchronous, or are we
> waiting for them to get written to disk before continuing?  Wonder
> if that might cause some slowdowns.

I'm pretty sure they are synchronous. I did see 50% of all cycles
consumed by VM time when testing on an m1.small (50% user, 50% VM). I
don't know whether that is I/O or something else.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJxIOYACgkQJdeBCYSNAAMTqACbB+IK5R2s3J3XE0rshnagvFkz
2XsAn0Zb84rpmRL6ysObL396G/xvIVyE
=wun5
-----END PGP SIGNATURE-----