Let's talk retries
Katherine Cox-Buday
katherine.cox-buday at canonical.com
Tue Aug 9 00:22:25 UTC 2016
Hey All,
We currently have 3 ways we're performing retries in Juju:
1. Archaic, custom, intra-package retry code.
2. github.com/juju/utils::{Countdown,BackoffTimer}
3. github.com/juju/retry (current best practice)
I have just touched some code which fits #1, and when I was attempting to retrofit it to #3, I found it a little obtuse. Here's why:
In my case (and I assume most), I wanted something like this:
for range retries { // Loop should break if our parent is cancelled
// Do something
// Success, retry, or fatal error
}
To implement this, I do something like this:
args := retry.CallArgs{
Func: func() error {
// Body of my loop
},
BackoffFunc: func(delay time.Duration, attempt int) time.Duration {
if attempt == 1 {
return delay
}
return delay * factor
},
Stop: dying,
// etc...
}
Call(args)
We attempt to encapsulate every scenario for retries in github.com/juju/retry/::CallArgs. We have variables to determine if errors are fatal, to notify things, how to back off, etc. This feels very heavy-weight to me, and a bit like the monolith model rather than the unix model. I would prefer the logic be left to the caller (i.e. do I break out of this loop? do I notify something?), and the interface into a retry strategy be much simpler, say a channel like in time.Tick(time.Duration) <-chan time.Time.
How would folks feel about something like this?
func BackoffTick(done <-chan struct{}, clock clock.Cock, delay time.Duration, factor int64) <-chan time.Time {
signalStream := make(chan time.Time)
go func() {
defer close(signalStream)
for {
select {
case <-done:
return
case signalStream <- time.After(delay):
delay = time.Duration(delay.Nanoseconds() * factor)
}
}
}()
return signalStream
}
With this, the above becomes:
for range BackoffTick(dying, myClock, 1*time.Second, 2) {
// Determination of fatality and notifications happen here
}
If you want a max retry concept, you do this:
for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {
// Determination of fatality and notifications happen here
}
If you want a max duration you do this:
done = Timeout(done, 5*time.Minute)
for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {
// Determination of fatality and notifications happen here
}
Functionally, what's in juju/retry is fine; and I can stuff anything I want into the function. It just feels a little odd to use in that I must put the body of my loop in a function, and I dislike that the CallArgs struct attempts to encapsulate a bunch of different scenarios.
Thoughts?
Also, is it on the roadmap to address the inconsitant retries throughout Juju? I almost used utils.BackoffTimer until I started looking deeper. I'm going to submit a PR to flag it as deprecated.
--
Katherine
More information about the Juju-dev
mailing list