Effect of lease tweaks

Wed Nov 1 14:43:19 UTC 2017

So I wanted to know if Andrew's changes in 2.3 are going to have a
noticeable affect at scale on Leadership. So I went and set up a test with
HA controllers running 10 machines each with 3 containers, and then
distributing ~500 applications each with 3 units across everything.
I started at commit 2e50e5cf4c3 which is just before Andrew's Lease patch
landed.

juju bootstrap aws/eu-west-2 --bootstrap-constraint instance-type=m4.xlarge
--config vpc-id=XXXX
juju enable-ha -n3
# Wait for things to stablize
juju deploy -B cs:~jameinel/ubuntu-lite -n10 --constraints
instance-type=m4.xlarge
# wait

#set up the containers
for i in `seq 0 9`; do
  juju deploy -n3 -B cs:~jameinel/ubuntu-leader ul --to
lxd:${i},lxd:${i},lxd:${i}
done

# scale up. I did this more in batches of a few at a time, but slowly grew
all the way up
for j in `seq 1 49`; do
  echo $j
  for i in `seq 0 9`; do
    juju deploy -B -n3 cs:~jameinel/ubuntu-leader ul${i}{$j} --to
${i}/lxd/0,${i}/lxd/1,${i}/lxd/2 &
  done
  time wait
done

I let it go for a while until "juju status" was happy that everything was
up and running. Note that this was 1500 units, 500 applications in a single
model.
time juju status was around 4-10s.

I was running 'mongotop' and watching 'top' while it was running.

I then upgraded to the latest juju dev (c49dd0d88a).
Now, the controller immediately started thrashing, with bad lease documents
in the database, and eventually got to the point that it ran out of open
file descriptors. Theoretically upgrading 2.2 => 2.3 won't have the same
problem because the actual upgrade step should run.
However, if I just did "db.leases.remove({})" it recovered.
I ended up having to restart mongo and jujud to recover from the open file
handles, but it did eventually recover.

At this point, I waited again for everything to look happy, and watch
mongotop and top again.

These aren't super careful results, where I would want to run things for
like an hour each and check the load over that whole time. Really I should
have set up prometheus monitoring. But as a quick check, these are the top
values for mongotop before:

                      ns    total     read    write
          local.oplog.rs    181ms    181ms      0ms
               juju.txns    120ms     10ms    110ms
             juju.leases     80ms     34ms     46ms
           juju.txns.log     24ms      4ms     19ms

                      ns    total     read    write
          local.oplog.rs    208ms    208ms      0ms
               juju.txns    140ms     12ms    128ms
             juju.leases     98ms     42ms     56ms
             juju.charms     43ms     43ms      0ms

                      ns    total     read    write
          local.oplog.rs    220ms    220ms      0ms
               juju.txns    161ms     14ms    146ms
             juju.leases    115ms     52ms     63ms
presence.presence.beings     69ms     68ms      0ms

                      ns    total     read    write
          local.oplog.rs    213ms    213ms      0ms
               juju.txns    164ms     15ms    149ms
             juju.leases     82ms     35ms     47ms
presence.presence.beings     79ms     78ms      0ms

                      ns    total     read    write
          local.oplog.rs    221ms    221ms      0ms
               juju.txns    168ms     13ms    154ms
             juju.leases     95ms     40ms     55ms
           juju.statuses     33ms     16ms     17ms

totals:
1043 local.oplog.rs
juju.txns 868
juju.leases 470

and after

                      ns    total    read    write
          local.oplog.rs     95ms    95ms      0ms
               juju.txns     68ms     6ms     61ms
             juju.leases     33ms    13ms     19ms
           juju.txns.log     13ms     3ms     10ms

                      ns    total     read    write
          local.oplog.rs    200ms    200ms      0ms
               juju.txns    160ms     10ms    150ms
             juju.leases     78ms     35ms     42ms
           juju.txns.log     29ms      4ms     24ms

                      ns    total     read    write
          local.oplog.rs    151ms    151ms      0ms
               juju.txns    103ms      6ms     97ms
             juju.leases     45ms     20ms     25ms
           juju.txns.log     21ms      6ms     15ms

                      ns    total     read    write
          local.oplog.rs    138ms    138ms      0ms
               juju.txns     98ms      6ms     91ms
             juju.leases     30ms     13ms     16ms
           juju.txns.log     18ms      3ms     14ms

                      ns    total     read    write
          local.oplog.rs    218ms    218ms      0ms
               juju.txns    196ms     14ms    182ms
             juju.leases     81ms     36ms     44ms
           juju.txns.log     34ms      5ms     29ms
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20171101/ce0052b0/attachment.html>