Scale Testing: Now with profiling!
John Arbash Meinel
john at arbash-meinel.com
Fri Nov 1 12:07:41 UTC 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2013-10-31 11:11, John Arbash Meinel wrote:
> So I managed to instrument a jujud with both CPU and Mem profiling
> dumps. I then brought up 1000 units and did some poking around.
>
> The results were actually pretty enlightening.
>
>
> 1) Guess what the #1 CPU time was. I know I was surprised: Total:
> 25469 samples 14380 56.5% 56.5% 14404 56.6%
> crypto/sha512.block 1261 5.0% 61.4% 1261 5.0%
> crypto/hmac.(*hmac).tmpPad 1219 4.8% 66.2% 15737 61.8%
> crypto/sha512.(*digest).Write 1208 4.7% 70.9% 9548 37.5%
> crypto/sha512.(*digest).Sum 439 1.7% 72.7% 19046 74.8%
> launchpad.net/juju-core/thirdparty/pbkdf2
>
One observation I didn't report. When testing this out, often
machine-0's agent would work for a while, but eventually it would end
up hitting 100% CPU and not getting any other work done. I didn't
notice in Top, but it was actually spending all that time in sys.
So I did some googling and found this:
http://grokbase.com/t/gg/golang-dev/1388yzq7yb/code-review-12183044-syscall-disable-cpu-profiling-around-fork-issue-12183044
Given that I saw some of the hangups at the time we were trying to run
"lxc-ls" it is possible that adding CPU profiling causes Fork to
potentially hang.
https://groups.google.com/forum/#!topic/golang-bugs/9Gyeef14Zaw
http://code.google.com/p/go/issues/detail?id=5517
https://code.google.com/p/gperftools/issues/detail?id=278
So some of the hanging that I saw may not actually be a problem in
practice, but not being able to profile the process seems pretty
unfortunate.
It looks like the fix landed on August 13:
http://code.google.com/p/go/source/detail?r=9eb1dd061b1f
Which is the day after Golang 1.1.2 was released.
At least that alleviates my fear that when jujud restarts we have a
high probability of hanging permanently. And I got enough profiling to
see that pbkdf2 appears to be the primary cause of slow startup. 70ms
per login * 10,000 agents = 700s ~ 12min or about 3min w/ 4-cpus.
I'm still skeptical that we need pbkdf2 for Agent logins, though I do
like it for user logins. (We are generating 18 character passwords
because originally they were used by Mongo which "only" md5sum'd them.
We could use sha512 and 64-byte/128-hex tokens if we cared.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlJzmYwACgkQJdeBCYSNAANS2QCfb+iNU8CNuPKf8Cb94KQNoTjw
ZgkAn23a5RYVhwDvKb2+tJ05aGuQxsQ+
=t1Ia
-----END PGP SIGNATURE-----
More information about the Juju-dev
mailing list