Effect of lease tweaks
John A Meinel
john.meinel at canonical.com
Wed Nov 1 14:43:19 UTC 2017
So I wanted to know if Andrew's changes in 2.3 are going to have a
noticeable affect at scale on Leadership. So I went and set up a test with
HA controllers running 10 machines each with 3 containers, and then
distributing ~500 applications each with 3 units across everything.
I started at commit 2e50e5cf4c3 which is just before Andrew's Lease patch
landed.
juju bootstrap aws/eu-west-2 --bootstrap-constraint instance-type=m4.xlarge
--config vpc-id=XXXX
juju enable-ha -n3
# Wait for things to stablize
juju deploy -B cs:~jameinel/ubuntu-lite -n10 --constraints
instance-type=m4.xlarge
# wait
#set up the containers
for i in `seq 0 9`; do
juju deploy -n3 -B cs:~jameinel/ubuntu-leader ul --to
lxd:${i},lxd:${i},lxd:${i}
done
# scale up. I did this more in batches of a few at a time, but slowly grew
all the way up
for j in `seq 1 49`; do
echo $j
for i in `seq 0 9`; do
juju deploy -B -n3 cs:~jameinel/ubuntu-leader ul${i}{$j} --to
${i}/lxd/0,${i}/lxd/1,${i}/lxd/2 &
done
time wait
done
I let it go for a while until "juju status" was happy that everything was
up and running. Note that this was 1500 units, 500 applications in a single
model.
time juju status was around 4-10s.
I was running 'mongotop' and watching 'top' while it was running.
I then upgraded to the latest juju dev (c49dd0d88a).
Now, the controller immediately started thrashing, with bad lease documents
in the database, and eventually got to the point that it ran out of open
file descriptors. Theoretically upgrading 2.2 => 2.3 won't have the same
problem because the actual upgrade step should run.
However, if I just did "db.leases.remove({})" it recovered.
I ended up having to restart mongo and jujud to recover from the open file
handles, but it did eventually recover.
At this point, I waited again for everything to look happy, and watch
mongotop and top again.
These aren't super careful results, where I would want to run things for
like an hour each and check the load over that whole time. Really I should
have set up prometheus monitoring. But as a quick check, these are the top
values for mongotop before:
ns total read write
local.oplog.rs 181ms 181ms 0ms
juju.txns 120ms 10ms 110ms
juju.leases 80ms 34ms 46ms
juju.txns.log 24ms 4ms 19ms
ns total read write
local.oplog.rs 208ms 208ms 0ms
juju.txns 140ms 12ms 128ms
juju.leases 98ms 42ms 56ms
juju.charms 43ms 43ms 0ms
ns total read write
local.oplog.rs 220ms 220ms 0ms
juju.txns 161ms 14ms 146ms
juju.leases 115ms 52ms 63ms
presence.presence.beings 69ms 68ms 0ms
ns total read write
local.oplog.rs 213ms 213ms 0ms
juju.txns 164ms 15ms 149ms
juju.leases 82ms 35ms 47ms
presence.presence.beings 79ms 78ms 0ms
ns total read write
local.oplog.rs 221ms 221ms 0ms
juju.txns 168ms 13ms 154ms
juju.leases 95ms 40ms 55ms
juju.statuses 33ms 16ms 17ms
totals:
1043 local.oplog.rs
juju.txns 868
juju.leases 470
and after
ns total read write
local.oplog.rs 95ms 95ms 0ms
juju.txns 68ms 6ms 61ms
juju.leases 33ms 13ms 19ms
juju.txns.log 13ms 3ms 10ms
ns total read write
local.oplog.rs 200ms 200ms 0ms
juju.txns 160ms 10ms 150ms
juju.leases 78ms 35ms 42ms
juju.txns.log 29ms 4ms 24ms
ns total read write
local.oplog.rs 151ms 151ms 0ms
juju.txns 103ms 6ms 97ms
juju.leases 45ms 20ms 25ms
juju.txns.log 21ms 6ms 15ms
ns total read write
local.oplog.rs 138ms 138ms 0ms
juju.txns 98ms 6ms 91ms
juju.leases 30ms 13ms 16ms
juju.txns.log 18ms 3ms 14ms
ns total read write
local.oplog.rs 218ms 218ms 0ms
juju.txns 196ms 14ms 182ms
juju.leases 81ms 36ms 44ms
juju.txns.log 34ms 5ms 29ms
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20171101/ce0052b0/attachment.html>
More information about the Juju-dev
mailing list