Testing Leader Election reconfiguration

Tim Van Steenburgh tim.van.steenburgh at canonical.com
Tue Mar 15 17:02:19 UTC 2016


On Tue, Mar 15, 2016 at 12:30 PM, Tom Barber <tom at analytical-labs.com>
wrote:

> Hi Tim,
>
> Why would I need to increase the timeout when the status says all the unit
> are operational?
>

The default wait time is 300s, with an "idle threshold" of 30s. Which
means, it waits for everything to be idle for 30s before returning from the
wait. This means that with the default timeout, if the env doesn't settle
within 4m30s, it'll time out. This may not be what's happening in your
case, but it's worth trying a longer timeout value to make sure.


> The status dump came out of bundletester which said that it failed on the
> first wait(), I assume the status dump arrived at the same time?
> Bugs are allowed, the test was hacked up from a previous one, it doesn't
> do anything yet, I'm trying to make sure the logic works first.
>
> Tom
>
> --------------
>
> Director Meteorite.bi - Saiku Analytics Founder
> Tel: +44(0)5603641316
>
> (Thanks to the Saiku community we reached our Kickstart
> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
> goal, but you can always help by sponsoring the project
> <http://www.meteorite.bi/products/saiku/sponsorship>)
>
> On 15 March 2016 at 16:27, Tim Van Steenburgh <
> tim.van.steenburgh at canonical.com> wrote:
>
>> Hey Tom,
>>
>> 1. You can increase the wait time until it doesn't time out:
>> self.d.sentry.wait(timeout=1200)
>> 2. At what point in this sequence of commands was the status dump
>> captured?
>> 3. There is a bug here. You take a reference to the pdi/0 info dict on
>> line 1. It's the same object you use to get message2 and message3 later.
>> So, you'll get the same message that you got on line 1. You need `message3
>> = self.d.sentry['pdi'][0].info['workload-status'].get('message')`
>> instead.
>>
>> Hope this helps.
>>
>> On Tue, Mar 15, 2016 at 11:41 AM, Tom Barber <tom at analytical-labs.com>
>> wrote:
>>
>>> Okay back here again, so my nice leader election function looks like:
>>>
>>>    def test_leader_election_failover(self):
>>>         unit = self.d.sentry['pdi'][0].info
>>>         message = unit['workload-status'].get('message')
>>>         ip = message.split(':', 1)[-1]
>>>         self.d.add_unit('pdi', 2)
>>>         self.d.sentry.wait()
>>>         message2 = unit['workload-status'].get('message')
>>>         ip2 = message2.split(':', 1)[-1]
>>>         self.assertEqual(ip, ip2)
>>>         self.d.remove_unit('pdi/0')
>>>         self.d.sentry.wait()
>>>         message3 = unit['workload-status'].get('message')
>>>         ip3 = message3.split(':', 1)[-1]
>>>
>>>         self.assertNotEqual(ip3, ip2)
>>>
>>> I know there's no logic in there, but I need to make sure the stuff
>>> actually functions.
>>>
>>> So Tim says wait() should work, but when I tested this last night,
>>>
>>> I get a timeout error o the wait right after add_unit.
>>>
>>> https://gist.github.com/buggtb/c271dd79d782af57dea6
>>>
>>> Yet in the status dump you can see all 3 units sat there seemingly happy.
>>>
>>> Tom
>>>
>>> --------------
>>>
>>> Director Meteorite.bi - Saiku Analytics Founder
>>> Tel: +44(0)5603641316
>>>
>>> (Thanks to the Saiku community we reached our Kickstart
>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>> goal, but you can always help by sponsoring the project
>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>
>>> On 9 March 2016 at 18:31, Tom Barber <tom at analytical-labs.com> wrote:
>>>
>>>> Oh really?
>>>>
>>>> /me stokes his invisible beard.
>>>>
>>>>
>>>> Okay I'll go back and try again.
>>>>
>>>> Tom
>>>>
>>>> --------------
>>>>
>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>> Tel: +44(0)5603641316
>>>>
>>>> (Thanks to the Saiku community we reached our Kickstart
>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>> goal, but you can always help by sponsoring the project
>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>
>>>> On 9 March 2016 at 16:56, Tim Van Steenburgh <
>>>> tim.van.steenburgh at canonical.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 9, 2016 at 6:31 AM, Tom Barber <tom at analytical-labs.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Stuart.
>>>>>>
>>>>>> I do put a note in my charm message indicating the leader IP address
>>>>>> so that users know which to connect to.
>>>>>>
>>>>>> So with juju wait, would I destroy a unit then execute juju wait? At
>>>>>> which point it will hang until the leader election stuff is over and all
>>>>>> becomes stable again?
>>>>>>
>>>>>>
>>>>> Since you're already using amulet, there's no need to use the
>>>>> juju-wait plugin
>>>>> since d.sentry.wait() does the same thing. So yes, you would do
>>>>> d.remove_unit(...)
>>>>> and then call d.sentry.wait().
>>>>>
>>>>>
>>>>>> Also, will this work if I push it upstream to the charmers and the
>>>>>> automated tests up there?
>>>>>>
>>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> --------------
>>>>>>
>>>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>>>> Tel: +44(0)5603641316
>>>>>>
>>>>>> (Thanks to the Saiku community we reached our Kickstart
>>>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>>>> goal, but you can always help by sponsoring the project
>>>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>>>
>>>>>> On 9 March 2016 at 11:00, Stuart Bishop <stuart.bishop at canonical.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On 9 March 2016 at 20:31, Tom Barber <tom at analytical-labs.com>
>>>>>>> wrote:
>>>>>>> > Morning all
>>>>>>> >
>>>>>>> > I'm trying to test for charm reconfiguration if the leader goes
>>>>>>> AWOL.
>>>>>>>
>>>>>>> I put the role of the unit in its workload status, so it is easy for
>>>>>>> operators to see which unit is master. And this also makes it easy
>>>>>>> for
>>>>>>> tests to tell.
>>>>>>>
>>>>>>>
>>>>>>> > Adam suggested that I watch the status waiting for the next leader
>>>>>>> election
>>>>>>> > hook the wait on that and then check my service configs.
>>>>>>>
>>>>>>> You are best of waiting for all the hooks to complete and a steady
>>>>>>> state, not just leader elected (since things will still be in flux
>>>>>>> when that hook fires, such as the leader-settings-changed hooks it
>>>>>>> will probably trigger and the relation changes those hooks will
>>>>>>> likely
>>>>>>> trigger). Use the juju-wait plugin, and maybe add support to
>>>>>>> https://bugs.launchpad.net/juju-core/+bug/1488777 to get this into
>>>>>>> core.
>>>>>>>
>>>>>>> --
>>>>>>> Stuart Bishop <stuart.bishop at canonical.com>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Juju mailing list
>>>>>> Juju at lists.ubuntu.com
>>>>>> Modify settings or unsubscribe at:
>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20160315/5b941774/attachment.html>


More information about the Juju mailing list