Friday evening handover, 1/6/2012

Gustavo Niemeyer gustavo.niemeyer at canonical.com
Mon Jun 4 22:12:34 UTC 2012


On Sun, Jun 3, 2012 at 7:41 PM, Dave Cheney <david.cheney at canonical.com> wrote:
>> This is actually my fault. We had already agreed a long time ago that
>> any non-stable issues from ZooKeeper should result in visible errors,
>> and we should fail, because there are edge cases that don't work so
>> well in case of a connection reestablishment, and because we need to
>> be able to handle harsh scenarios anyway.
>>
>> I'll fix gozk so it behaves in that way.

That's done: https://codereview.appspot.com/6292044

>>> 2. The timeout value passed to zookeeper.Dial() doesn't do anything,
>>> I believe this is a bug.
>>
>> I believe it does, but it's being ignored. See the open function in
>> state/open.go. The events in the session channel will tell that things
>> are not so well.
>
> Fair enough, i'll have to take your word for it. I took to setting that value to 15e6 and 1/2 an hour later the PA was still sitting their waiting to connect.

Have you seen TestWatchOnReconnection on zk_test.go?


>> I actually think the correct thing to do is to take any unusual state
>> as fatal, clean up state properly, and then reestablish our knowledge
>> about the whole environment by synchronously stopping background
>> activity, closing the state, and redialing in.
>
> Yup, that should be fairly straight forward now, the Provisioner will exit with an error, we can then close the old state connection, open a new one, build a NewProvisioner and try again.

Will have a look, thanks.


gustavo @ http://niemeyer.net



More information about the Juju-dev mailing list