<div dir="ltr"><div>I feel obliged to note that we also have axw's operation queue, used in storageprovisioner, and that it's the only one which doesn't make the assumption that the code being retried is the only important thing that could possibly be happening in the calling context.<br><br>All the other approaches discussed will leave us blocking 99 good machines behind a single failure -- and while that's harmless-ish in the current provisioner, because it has a strategy of waiting no more than 30s in total, it's going to bite us hard as soon as we switch to a strategy that actually backs off and keeps retrying long enough to be useful.<br><br></div>Andrew: as I recall, that code is actually pretty general. Is there a strong reason it's tucked away in an /internal/ package?<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 9, 2016 at 9:02 AM, roger peppe <span dir="ltr"><<a href="mailto:roger.peppe@canonical.com" target="_blank">roger.peppe@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 9 August 2016 at 07:28, roger peppe <<a href="mailto:roger.peppe@canonical.com">roger.peppe@canonical.com</a>> wrote:<br>
> On 9 August 2016 at 01:22, Katherine Cox-Buday<br>
> <<a href="mailto:katherine.cox-buday@canonical.com">katherine.cox-buday@<wbr>canonical.com</a>> wrote:<br>
>> Hey All,<br>
>><br>
>> We currently have 3 ways we're performing retries in Juju:<br>
>><br>
>> 1. Archaic, custom, intra-package retry code.<br>
>> 2. <a href="http://github.com/juju/utils::{Countdown,BackoffTimer}" rel="noreferrer" target="_blank">github.com/juju/utils::{<wbr>Countdown,BackoffTimer}</a><br>
>> 3. <a href="http://github.com/juju/retry" rel="noreferrer" target="_blank">github.com/juju/retry</a> (current best practice)<br>
><br>
> There's a fourth way that you haven't mentioned, which fits<br>
> somewhere in between 1 and 2 I think (it was the first<br>
> explicit general retry code in Juju AFAIK), which is<br>
> utils.Attempt.<br>
><br>
> I'd suggest that the way it's used matches the pattern<br>
> you'd like to write quite well:<br>
><br>
> for attempt := strategy.State(); attempt.Next(); {<br>
<br>
</span>Oops; s/State/Start/ of course.<br>
<div class="HOEnZb"><div class="h5"><br>
> Do something<br>
> }<br>
><br>
> AFAIR this pattern was arrived at after some discussion between myself<br>
> and Gustavo. At the time I was thinking of some kind of<br>
> callback-based scheme like the others schemes you mention, but<br>
> I think that leaving the caller in control and the loop explicit<br>
> is actually nicer - the logic is clear to the reader and more<br>
> composable with other primitives (it is also a good match<br>
> for the pattern you propose).<br>
><br>
> How about making AttemptStrategy a little more flexible?<br>
> It was always my intention to admit other kinds of strategy - if<br>
> a BackoffFunc field were added to AttemptStrategy and<br>
> a Stop argument added to AttemptStrategy.Start it would<br>
> become as general as retry.Call, while keeping the nice<br>
> usage pattern and general straightforwardness.<br>
><br>
> cheers,<br>
> rog.<br>
><br>
>> I have just touched some code which fits #1, and when I was attempting to retrofit it to #3, I found it a little obtuse. Here's why:<br>
>><br>
>> In my case (and I assume most), I wanted something like this:<br>
>><br>
>> for range retries { // Loop should break if our parent is cancelled<br>
>> // Do something<br>
>> // Success, retry, or fatal error<br>
>> }<br>
>><br>
>> To implement this, I do something like this:<br>
>><br>
>> args := retry.CallArgs{<br>
>> Func: func() error {<br>
>> // Body of my loop<br>
>> },<br>
>> BackoffFunc: func(delay time.Duration, attempt int) time.Duration {<br>
>> if attempt == 1 {<br>
>> return delay<br>
>> }<br>
>> return delay * factor<br>
>> },<br>
>> Stop: dying,<br>
>> // etc...<br>
>> }<br>
>><br>
>> Call(args)<br>
>><br>
>> We attempt to encapsulate every scenario for retries in <a href="http://github.com/juju/retry/::CallArgs" rel="noreferrer" target="_blank">github.com/juju/retry/::<wbr>CallArgs</a>. We have variables to determine if errors are fatal, to notify things, how to back off, etc. This feels very heavy-weight to me, and a bit like the monolith model rather than the unix model. I would prefer the logic be left to the caller (i.e. do I break out of this loop? do I notify something?), and the interface into a retry strategy be much simpler, say a channel like in time.Tick(time.Duration) <-chan time.Time.<br>
>><br>
>> How would folks feel about something like this?<br>
>><br>
>> func BackoffTick(done <-chan struct{}, clock clock.Cock, delay time.Duration, factor int64) <-chan time.Time {<br>
>> signalStream := make(chan time.Time)<br>
>> go func() {<br>
>> defer close(signalStream)<br>
>> for {<br>
>> select {<br>
>> case <-done:<br>
>> return<br>
>> case signalStream <- time.After(delay):<br>
>> delay = time.Duration(delay.<wbr>Nanoseconds() * factor)<br>
>> }<br>
>> }<br>
>> }()<br>
>> return signalStream<br>
>> }<br>
>><br>
>> With this, the above becomes:<br>
>><br>
>> for range BackoffTick(dying, myClock, 1*time.Second, 2) {<br>
>> // Determination of fatality and notifications happen here<br>
>> }<br>
>><br>
>> If you want a max retry concept, you do this:<br>
>><br>
>> for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {<br>
>> // Determination of fatality and notifications happen here<br>
>> }<br>
>><br>
>> If you want a max duration you do this:<br>
>><br>
>> done = Timeout(done, 5*time.Minute)<br>
>> for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {<br>
>> // Determination of fatality and notifications happen here<br>
>> }<br>
>><br>
>> Functionally, what's in juju/retry is fine; and I can stuff anything I want into the function. It just feels a little odd to use in that I must put the body of my loop in a function, and I dislike that the CallArgs struct attempts to encapsulate a bunch of different scenarios.<br>
>><br>
>> Thoughts?<br>
><br>
> I'm not keen on the channel-based approach because if you want to<br>
> terminate early, you need to shut down the goroutine - it's just<br>
> one more resource to clean up. Also, the goroutine that's sending<br>
> the events is one step removed from what's going on in the loop - the<br>
> delay is calculated one step before the current iteration is complete.<br>
><br>
> cheers,<br>
> rog.<br>
><br>
>><br>
>> Also, is it on the roadmap to address the inconsitant retries throughout Juju? I almost used utils.BackoffTimer until I started looking deeper. I'm going to submit a PR to flag it as deprecated.<br>
>><br>
>> --<br>
>> Katherine<br>
>><br>
>> --<br>
>> Juju-dev mailing list<br>
>> <a href="mailto:Juju-dev@lists.ubuntu.com">Juju-dev@lists.ubuntu.com</a><br>
>> Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" rel="noreferrer" target="_blank">https://lists.ubuntu.com/<wbr>mailman/listinfo/juju-dev</a><br>
<br>
--<br>
Juju-dev mailing list<br>
<a href="mailto:Juju-dev@lists.ubuntu.com">Juju-dev@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" rel="noreferrer" target="_blank">https://lists.ubuntu.com/<wbr>mailman/listinfo/juju-dev</a><br>
</div></div></blockquote></div><br></div>