Friday evening handover, 1/6/2012

Gustavo Niemeyer gustavo.niemeyer at
Mon Jun 4 22:12:34 UTC 2012

On Sun, Jun 3, 2012 at 7:41 PM, Dave Cheney <david.cheney at> wrote:
>> This is actually my fault. We had already agreed a long time ago that
>> any non-stable issues from ZooKeeper should result in visible errors,
>> and we should fail, because there are edge cases that don't work so
>> well in case of a connection reestablishment, and because we need to
>> be able to handle harsh scenarios anyway.
>> I'll fix gozk so it behaves in that way.

That's done:

>>> 2. The timeout value passed to zookeeper.Dial() doesn't do anything,
>>> I believe this is a bug.
>> I believe it does, but it's being ignored. See the open function in
>> state/open.go. The events in the session channel will tell that things
>> are not so well.
> Fair enough, i'll have to take your word for it. I took to setting that value to 15e6 and 1/2 an hour later the PA was still sitting their waiting to connect.

Have you seen TestWatchOnReconnection on zk_test.go?

>> I actually think the correct thing to do is to take any unusual state
>> as fatal, clean up state properly, and then reestablish our knowledge
>> about the whole environment by synchronously stopping background
>> activity, closing the state, and redialing in.
> Yup, that should be fairly straight forward now, the Provisioner will exit with an error, we can then close the old state connection, open a new one, build a NewProvisioner and try again.

Will have a look, thanks.

gustavo @

More information about the Juju-dev mailing list