<div dir="ltr">(sent too soon)<div><br></div><div>Summary:</div><div>before:</div><div><div><font face="monospace, monospace">1043 <a href="http://local.oplog.rs">local.oplog.rs</a> </font></div><div><font face="monospace, monospace"> 868 juju.txns </font></div><div><font face="monospace, monospace"> 470 juju.leases</font></div></div><div><br></div><div>after:</div><div><font face="monospace, monospace"> 802 <a href="http://local.oplog.rs">local.oplog.rs</a></font></div><div><font face="monospace, monospace"> 625 juju.txns</font></div><div><font face="monospace, monospace"> 267 juju.leases</font></div><div><br></div><div>So there seems to be a fairly noticeable decrease in load on the system around leases (~70%). Again, not super scientific because I didn't measure over enough time, deal with variation, all that kind of stuff. But at least at a glimpse it looks pretty good.</div><div>As far as load around the global clock:</div><div><div><font face="monospace, monospace"> juju.globalclock 5ms 3ms 1ms</font></div></div><div><div><font face="monospace, monospace"> juju.globalclock 10ms 8ms 1ms</font></div></div><div>etc</div><div><br></div><div>So generally noticeable, but not specifically an issue.</div><div><br></div><div>Hopefully we'll see similar improvements in live systems. The main thing is to make sure upgrade is smooth from 2.2 to 2.3 since the lease issue was a pretty major crash of the system.</div><div><br></div><div>John</div><div>=:-></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 1, 2017 at 6:43 PM, John A Meinel <span dir="ltr"><<a href="mailto:john.meinel@canonical.com" target="_blank">john.meinel@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">So I wanted to know if Andrew's changes in 2.3 are going to have a noticeable affect at scale on Leadership. So I went and set up a test with HA controllers running 10 machines each with 3 containers, and then distributing ~500 applications each with 3 units across everything.<div>I started at commit 2e50e5cf4c3 which is just before Andrew's Lease patch landed.<br><div><br></div><div>juju bootstrap aws/eu-west-2 --bootstrap-constraint instance-type=m4.xlarge --config vpc-id=XXXX</div><div>juju enable-ha -n3</div><div># Wait for things to stablize</div><div>juju deploy -B cs:~jameinel/ubuntu-lite -n10 --constraints instance-type=m4.xlarge</div><div># wait</div><div><br></div><div>#set up the containers</div><div><div><font face="monospace, monospace">for i in `seq 0 9`; do</font></div></div><div><font face="monospace, monospace"> juju deploy -n3 -B cs:~jameinel/ubuntu-leader ul --to lxd:${i},lxd:${i},lxd:${i}</font></div><div><font face="monospace, monospace">done</font></div><div><br></div><div># scale up. I did this more in batches of a few at a time, but slowly grew all the way up</div><div><font face="monospace, monospace">for j in `seq 1 49`; do<br></font></div><div><font face="monospace, monospace"> echo $j</font></div><div><font face="monospace, monospace"> for i in `seq 0 9`; do</font></div><div><font face="monospace, monospace"> juju deploy -B -n3 cs:~jameinel/ubuntu-leader ul${i}{$j} --to ${i}/lxd/0,${i}/lxd/1,${i}/<wbr>lxd/2 &</font></div><div><font face="monospace, monospace"> done</font></div><div><font face="monospace, monospace"> time wait</font></div><div><font face="monospace, monospace">done</font></div><div><br></div><div>I let it go for a while until "juju status" was happy that everything was up and running. Note that this was 1500 units, 500 applications in a single model.</div><div>time juju status was around 4-10s.</div><div><br></div><div>I was running 'mongotop' and watching 'top' while it was running.</div><div><br>I then upgraded to the latest juju dev (c49dd0d88a).</div></div><div>Now, the controller immediately started thrashing, with bad lease documents in the database, and eventually got to the point that it ran out of open file descriptors. Theoretically upgrading 2.2 => 2.3 won't have the same problem because the actual upgrade step should run.</div><div>However, if I just did "db.leases.remove({})" it recovered.</div><div>I ended up having to restart mongo and jujud to recover from the open file handles, but it did eventually recover.</div><div><br></div><div>At this point, I waited again for everything to look happy, and watch mongotop and top again.</div><div><br>These aren't super careful results, where I would want to run things for like an hour each and check the load over that whole time. Really I should have set up prometheus monitoring. But as a quick check, these are the top values for mongotop before:</div><div><font face="monospace, monospace"><br></font></div><div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 181ms 181ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 120ms 10ms 110ms</font></div><div><font face="monospace, monospace"> juju.leases 80ms 34ms 46ms</font></div><div><font face="monospace, monospace"> juju.txns.log 24ms 4ms 19ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 208ms 208ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 140ms 12ms 128ms</font></div><div><font face="monospace, monospace"> juju.leases 98ms 42ms 56ms</font></div><div><font face="monospace, monospace"> juju.charms 43ms 43ms 0ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 220ms 220ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 161ms 14ms 146ms</font></div><div><font face="monospace, monospace"> juju.leases 115ms 52ms 63ms</font></div><div><font face="monospace, monospace">presence.presence.beings 69ms 68ms 0ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 213ms 213ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 164ms 15ms 149ms</font></div><div><font face="monospace, monospace"> juju.leases 82ms 35ms 47ms</font></div><div><font face="monospace, monospace">presence.presence.beings 79ms 78ms 0ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 221ms 221ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 168ms 13ms 154ms</font></div><div><font face="monospace, monospace"> juju.leases 95ms 40ms 55ms</font></div><div><font face="monospace, monospace"> juju.statuses 33ms 16ms 17ms</font></div></div><div><br></div><div>totals:</div><div>1043 <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> </div><div>juju.txns 868</div><div>juju.leases 470</div><div><br></div><div>and after</div><div><br></div><div><div><font face="monospace, monospace"> ns total read write </font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 95ms 95ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 68ms 6ms 61ms</font></div><div><font face="monospace, monospace"> juju.leases 33ms 13ms 19ms</font></div><div><font face="monospace, monospace"> juju.txns.log 13ms 3ms 10ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 200ms 200ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 160ms 10ms 150ms</font></div><div><font face="monospace, monospace"> juju.leases 78ms 35ms 42ms</font></div><div><font face="monospace, monospace"> juju.txns.log 29ms 4ms 24ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 151ms 151ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 103ms 6ms 97ms</font></div><div><font face="monospace, monospace"> juju.leases 45ms 20ms 25ms</font></div><div><font face="monospace, monospace"> juju.txns.log 21ms 6ms 15ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 138ms 138ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 98ms 6ms 91ms</font></div><div><font face="monospace, monospace"> juju.leases 30ms 13ms 16ms</font></div><div><font face="monospace, monospace"> juju.txns.log 18ms 3ms 14ms</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> ns total read write</font></div><div><font face="monospace, monospace"> <a href="http://local.oplog.rs" target="_blank">local.oplog.rs</a> 218ms 218ms 0ms</font></div><div><font face="monospace, monospace"> juju.txns 196ms 14ms 182ms</font></div><div><font face="monospace, monospace"> juju.leases 81ms 36ms 44ms</font></div><div><font face="monospace, monospace"> juju.txns.log 34ms 5ms 29ms</font></div><div><br></div></div><div><br></div><div><br></div></div>
</blockquote></div><br></div>