Juju/ZK watch re-establishment in presence of errors
torinsandall at gmail.com
Thu May 9 18:06:41 UTC 2013
I'm wondering if one of the developers familiar with Juju's use of ZK
watches can clarify something for me.
I've been running into some problems with Juju machine and unit agents
involving ZK-related errors (connection loss, operation timeout, etc.)
One observation I have is that the Juju implementation makes no attempt to
re-establish watches if there's an error while processing an event. In fact
some of the low level functions which register the watches even remark on
this in their docstrings. The problem with this approach is that a single
failed ZK call can render agents inoperable. I found the retry-backoff and
related txzookeeper branch however I want to know if re-establishing the
watches in case of errors is valid or not.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Juju