Notes from Scale testing
John Arbash Meinel
john at arbash-meinel.com
Thu Oct 31 09:10:40 UTC 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2013-10-31 12:25, William Reade wrote:
> On Wed, Oct 30, 2013 at 8:15 AM, John Arbash Meinel
> <john at arbash-meinel.com <mailto:john at arbash-meinel.com>> wrote:
>
>>> Just to be clear for other readers (wasn't clear to me without
>>> checking the src) this isn't the agent resolving the api
>>> server address from provider-state which would mean provider
>>> credentials available to each agent, but each agent
>>> periodically requesting via the api the address of the api
>>> servers. So the cache here is on the api server.
>
> The cache does need to be either in the DB or on the API server.
> The trigger is that running a hook includes the API Addresses in
> the hook context. So every hook triggers a call to API Addresses
> (not sure if hooks fired in sequence cache the state between
> calls).
>
>
> The unit doesn't cache that information, indeed; and it's
> information that *does* exist in the db, now, but watching it
> reliably is not currently possible. If we had a DB cache of state
> servers -- or even just a direct cache of their addresses -- it
> would be pretty simple to fix; and we'll need *that* for the HA
> work, so we should prefer that approach over hacking in something
> unreliable at the API level.
Yeah. Roger has indicated that he does have a patch to take the API
Addresses from the DB rather than the environment, but it failed to
land because of a failing test. It should land soon. I certainly
prefer caching it in the DB until we find the DB being a bottleneck.
(I have seen that sometimes, but not all that often.)
*Right now* the biggest problem I've seen is that occasionally the
machine-0 agent gets 'stuck' consuming 100% CPU and I haven't gotten a
reliable indication of what it is doing (SIGQUIT doesn't seem to give
the same traceback each time, with >7k goroutines it is a bit hard to
track down).
When this happens, it stops responding to API requests and all the
agents appear as down.
>
> And that triggers the API server to make a request from EC2.
>
>
> We should be a little bit wary of not caching simplestreams
> requests, too. The impact should be much lower than in the previous
> scheme, but I think it could still be a problem for both upgrader
> and provisioner in certain environments.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlJyHpAACgkQJdeBCYSNAANVSwCcC7n59clwMI8L9MaftB9q4akX
F2IAoJBaWZRbhe3KvElLsiFWlsYzrAe1
=eT+x
-----END PGP SIGNATURE-----
More information about the Juju-dev
mailing list