<div dir="ltr"><br><div class="gmail_extra">Hi Torin,<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 9, 2013 at 2:06 PM, Torin Sandall <span dir="ltr"><<a href="mailto:torinsandall@gmail.com" target="_blank">torinsandall@gmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi All,<div><br></div><div>I'm wondering if one of the developers familiar with Juju's use of ZK watches can clarify something for me.</div> <div><br></div><div>I've been running into some problems with Juju machine and unit agents involving ZK-related errors (connection loss, operation timeout, etc.)</div> <div><br></div></div></blockquote><div><br></div><div>I went ahead and fixed up and merged the txzk branch that deals with connection errors and backoff. Its available from the juju ppa (ppa:juju/pkgs)<br></div><div> <br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"><div> </div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>One observation I have is that the Juju implementation makes no attempt to re-establish watches if there's an error while processing an event.</div> </div></blockquote><div><br></div><div>The watch handler should be handling any exceptional conditions wrt to data its consuming outside of connectivity issues which are handled by txzk. If the watch handler's don't handle data appropriately that's a bug.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"><div> In fact some of the low level functions which register the watches even remark on this in their docstrings. The problem with this approach is that a single failed ZK call can render agents inoperable. I found the retry-backoff and related txzookeeper branch however I want to know if re-establishing the watches in case of errors is valid or not.</div> </div></blockquote><div><br></div><div>The watch re-establishment in the retry-backoff branch (merged today), addresses the connectivity issues, and on reconnect triggers the watch callback/handler. The handlers themselves are setup to refetch current state and revaluate to their known state, so the watch restablishment on reconnect amounts to them firing again and catches up the agents again with current state (be it zero delta or some significant delta).<br> </div></div><br></div><div class="gmail_extra">hope that helps,<br><br>Kapil<br></div></div>