Rejection of peer join.

Wed Sep 27 12:47:01 UTC 2017

On 27 September 2017 at 14:50, Michael Van Der Beek
<michael.van at antlabs.com> wrote:
> Hi Stuart,
>
> I think you misinterpreted what I was asking
>
> Assuming a pair of instance already are in relationship.
> Lets assume we have drbd running between these two instances lets call it A-B.
> If juju starts a 3rd instance C.  I want to make sure the 3rd cannot join the pair as drbd not supposed to have 3rd node. Although in theory you can create a stack node for 3rd or more backups.
>
> So when the ha-relation-joined is triggered on either A or B. How do I tell C, it is rejected from the join so that it doesn't screw up juju. I suppose if A/B can send do a "relation-set" some thing to tell C, it is rejected.

I would have C use relation-list to get a list of all the peers. If
there are two or more units with a lower unit number, C knows it
should not join, sets its status to 'blocked' and exit 0. There is no
need to involve A or B in the conversation; Unit C can make the
decision itself. Unit C could happily sit there in blocked state until
either A or B departs, at which point C would see only one unit with a
lower unit number and know it should join. You could even argue that
the blocked state is incorrect with this design, and that unit C
should actually set its status to active or maintenance, with a
message stating it is a hot spare.

The trick is that if someone does 'juju deploy -n3 thecharm', there is
no guarantee on what order the units join the peer relation. So Unit C
may think it should be active, and then a short while later sees other
units join and will need to become inactive. If you need to avoid this
situation, you are going to need to use Juju leadership to avoid these
sorts of race conditions. There is only one leader at any point in
time, so it can make decisions like which of the several units should
be active or not without worrying about race conditions. But it sounds
overly complex for your use case.

> The issue for me, is how to scale if you have specific data in a set of nodes.  So you can ceph, or drbd or some cluster. So ceph will require 3 ceph nodes, drbd two nodes and maybe galera cluster 3 nodes.
>
> So my idea is that there is already a loadbalance to scale. So my idea is each time you want to scale you would add one or more pairs (assuming drbd) to an already existing set of pairs. The load balancer will just redirect data to specific pairs based on some logic (like modulus of the last octet of customer IP which can give you 256 pairs). This is how we are doing on physical machines. Haven't had a customer yet that requires more than 10,000 tps for radius or 5 million concurrent sessions). Note I use pairs loosely in this line as the pair if running galera cluster is 3 nodes instead of pair).
>
> I'm currently trying to figure how to do it on openstack. If you have some recommendation for me to read/view on how people deal with scaling for very high write IO to disk. Current for radius we are looking at near 95% writes 5%reads. Nobody reads the data unless someone wants to know if user X is currently logged in. If it was the other way around in (R/W) IO requirements its much easier to scale.

I'm generally deploying charms to bare metal and using local disk if
there are any sort of non-trivial IO requirements. I believe the newer
Juju storage features should allow you mount whatever sort of volumes
you want from OpenStack (or any cloud provider), but I'm not familiar
with the OpenStack specifics.

-- 
Stuart Bishop <stuart.bishop at canonical.com>