Let's talk retries

Tue Aug 9 07:02:34 UTC 2016

On 9 August 2016 at 07:28, roger peppe <roger.peppe at canonical.com> wrote:
> On 9 August 2016 at 01:22, Katherine Cox-Buday
> <katherine.cox-buday at canonical.com> wrote:
>> Hey All,
>>
>> We currently have 3 ways we're performing retries in Juju:
>>
>> 1. Archaic, custom, intra-package retry code.
>> 2. github.com/juju/utils::{Countdown,BackoffTimer}
>> 3. github.com/juju/retry (current best practice)
>
> There's a fourth way that you haven't mentioned, which fits
> somewhere in between 1 and 2 I think (it was the first
> explicit general retry code in Juju AFAIK), which is
> utils.Attempt.
>
> I'd suggest that the way it's used matches the pattern
> you'd like to write quite well:
>
>      for attempt := strategy.State(); attempt.Next(); {

Oops; s/State/Start/ of course.

>           Do something
>      }
>
> AFAIR this pattern was arrived at after some discussion between myself
> and Gustavo. At the time I was thinking of some kind of
> callback-based scheme like the others schemes you mention, but
> I think that leaving the caller in control and the loop explicit
> is actually nicer - the logic is clear to the reader and more
> composable with other primitives (it is also a good match
> for the pattern you propose).
>
> How about making AttemptStrategy a little more flexible?
> It was always my intention to admit other kinds of strategy - if
> a BackoffFunc field were added to AttemptStrategy and
> a Stop argument added to AttemptStrategy.Start it would
> become as general as retry.Call, while keeping the nice
> usage pattern and general straightforwardness.
>
>   cheers,
>     rog.
>
>> I have just touched some code which fits #1, and when I was attempting to retrofit it to #3, I found it a little obtuse. Here's why:
>>
>> In my case (and I assume most), I wanted something like this:
>>
>> for range retries { // Loop should break if our parent is cancelled
>>         // Do something
>>     // Success, retry, or fatal error
>> }
>>
>> To implement this, I do something like this:
>>
>> args := retry.CallArgs{
>>     Func: func() error {
>>         // Body of my loop
>>     },
>>     BackoffFunc: func(delay time.Duration, attempt int) time.Duration {
>>         if attempt == 1 {
>>             return delay
>>         }
>>         return delay * factor
>>     },
>>     Stop: dying,
>>     // etc...
>> }
>>
>> Call(args)
>>
>> We attempt to encapsulate every scenario for retries in github.com/juju/retry/::CallArgs. We have variables to determine if errors are fatal, to notify things, how to back off, etc. This feels very heavy-weight to me, and a bit like the monolith model rather than the unix model. I would prefer the logic be left to the caller (i.e. do I break out of this loop? do I notify something?), and the interface into a retry strategy be much simpler, say a channel like in time.Tick(time.Duration) <-chan time.Time.
>>
>> How would folks feel about something like this?
>>
>> func BackoffTick(done <-chan struct{}, clock clock.Cock, delay time.Duration, factor int64) <-chan time.Time {
>>         signalStream := make(chan time.Time)
>>         go func() {
>>                 defer close(signalStream)
>>                 for {
>>                         select {
>>                         case <-done:
>>                                 return
>>                         case signalStream <- time.After(delay):
>>                                 delay = time.Duration(delay.Nanoseconds() * factor)
>>                         }
>>                 }
>>         }()
>>         return signalStream
>> }
>>
>> With this, the above becomes:
>>
>> for range BackoffTick(dying, myClock, 1*time.Second, 2) {
>>     // Determination of fatality and notifications happen here
>> }
>>
>> If you want a max retry concept, you do this:
>>
>> for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {
>>     // Determination of fatality and notifications happen here
>> }
>>
>> If you want a max duration you do this:
>>
>> done = Timeout(done, 5*time.Minute)
>> for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {
>>     // Determination of fatality and notifications happen here
>> }
>>
>> Functionally, what's in juju/retry is fine; and I can stuff anything I want into the function. It just feels a little odd to use in that I must put the body of my loop in a function, and I dislike that the CallArgs struct attempts to encapsulate a bunch of different scenarios.
>>
>> Thoughts?
>
> I'm not keen on the channel-based approach because if you want to
> terminate early, you need to shut down the goroutine - it's just
> one more resource to clean up. Also, the goroutine that's sending
> the events is one step removed from what's going on in the loop - the
> delay is calculated one step before the current iteration is complete.
>
>   cheers,
>     rog.
>
>>
>> Also, is it on the roadmap to address the inconsitant retries throughout Juju? I almost used utils.BackoffTimer until I started looking deeper. I'm going to submit a PR to flag it as deprecated.
>>
>> --
>> Katherine
>>
>> --
>> Juju-dev mailing list
>> Juju-dev at lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev