Error conventions in state package

Gustavo Niemeyer gustavo.niemeyer at canonical.com
Mon Jun 11 17:58:18 UTC 2012


Just had a private conversation with Frank about the error conventions
on the state package. We're still evolving towards an idea of how to
handle them, but this may be useful insight at some point. Feel free
to ignore it for the moment in case you're not worried right now.

This is about this bug:

    https://bugs.launchpad.net/juju-core/+bug/1009213

And the first candidate branch was this one:

    https://codereview.appspot.com/6306067/

Log:


<TheMue> Did you already take a look into the changes?
<niemeyer> I'm looking at them as we speak
<TheMue> OK
<niemeyer> Why do we need all of those errors?
<niemeyer>    19         PersistencyError  ErrorCode = -4
<niemeyer>   20         DataFormatError   ErrorCode = -3
<niemeyer>   21         StateChangedError ErrorCode = -2
<niemeyer>   22         TopologyError     ErrorCode = -1
<niemeyer>   23         OK                ErrorCode = 0
<niemeyer>   24         MachineError      ErrorCode = 1
<niemeyer>   25         UnitError         ErrorCode = 2
<niemeyer>   26         RelationError     ErrorCode = 3
<TheMue> I've been inspired by your concept for gozk.
<niemeyer> Who's making use of them?
<niemeyer> gozk is far from a good example.. it wraps an error
convention which isn't ours
<niemeyer> We had no choice but to expose what the underlying library does
<TheMue> We are passing HOF to topology for changes. Here we have a
mix of logical errors inside those funcs, errors during
marshalling/unmarshalling and with ZooKeeper.
<TheMue> The types help to check where the error is coming from and if
it may contain infos like internal keys.
<niemeyer> Where are you using those types today?
<niemeyer> This isn't a usual convention in Go..
<TheMue> That's why I wrote that's a proposal for discussion. I also
thought about a number of individual error types.
<niemeyer> Yeah, and that's what concerns me.. we have a problem that
is unrelated to that number of error types or codes
<TheMue> Pure error string information makes the differentiation difficult.
<niemeyer> Our problem is poor messaging
<niemeyer> and this is not solving it
<niemeyer> This branch addresses the wrong side of the problem..
<niemeyer> 152 »       »       return stateError(DataFormatError, err,
"illegal content in conf
<niemeyer>      ig node")
<niemeyer> Who cares it's a DataFormatError?
<TheMue> It is a first step let the higher-level types provide better
messages without loosing lower-level informations.
<niemeyer> I can tell you that a lot of people will care that they
have no idea about what is actually blowing up, though!
<TheMue> That's why it's a lower level error.
<TheMue> t can be analyzed by the caller, but StateError.Error()
doesn't show it.
<niemeyer> Exactly!
<TheMue> It shows the higher-level message for the operation during
which this error has been occured.
<niemeyer> We have zero code today interested on the fact it's a
"DataFormatError" (public!), but all of us are interested in what is
actually blowing up
<TheMue> But an interested receiver of the error can retrieve the
error and e.g. log it.
<niemeyer> It fixes a problem we don't yet have, and doesn't solve the
problem we do have
<niemeyer> Every receiver of the error is interested.. and it will log
it as %v, err
<TheMue> OK, so in case of illegal YAML data e.g. in a ConfigNode,
where goyaml fails, what would you return.
<TheMue> ?
<niemeyer> fmt.Errorf("invalid YAML data in node %q", path)
<niemeyer> Next?
<TheMue> Who is interested in that we're using YAML in a node? That is
the same kind of low level information.
<niemeyer> The developers (*we*) are interested that the YAML
unmarshalling has failed
<TheMue> So what do you do with this string if you don't pass it to
the higher level?
<niemeyer> That's the problem we got, that's what must be reported..
no further types or codes needed today, since we're not introspecting
that
<niemeyer> We pass it to the higher level, because that's a bug
<niemeyer> That's not the same as, for example, `service "mysql" not found`
<TheMue> And when in topology something is not found by key, what do
you return then?
<niemeyer> We have to show that message rather  than `service
"service-00000042 not found`
<niemeyer> We return something similar, but we don't *let* it blow up,
because we check first
<niemeyer> if !topology.HasService(...) { return fmt.Errorf("Uh, oh..
whatever you asked for wasn't found dude")
<niemeyer> The idea is to be nice when we can
<TheMue> And how do you check it efficiently when you only get a bunch
of possible error strings back.
<niemeyer> Check efficiently?
<niemeyer> That's why I suggested going slowly, by the way.. I was
hoping to have that kind of brainstorm on reviews
<niemeyer> There isn't a single answer for everything
<TheMue> Yes, during retryTopologyChange() you get something back that
is "only" an error with Error() string. This string shows what has
happened. By goyaml, by gazk, by logic of the higher level API ("can't
find relation xyz").
<niemeyer> Yep
<TheMue> There is a first step of an answer. I right now try to get a
better view on what error I get returned.
<niemeyer> (not sure about what you mean by that.. we both know how
retryTopologyChange works..?)
<TheMue> Maybe my codes are not good yet. But they are only a first step.
<TheMue> But indeed, maybe having multiple types and then doing type
assertions is a better way.
<niemeyer> Please, avoid introducing types before you need them
<TheMue> Yes, I know it, and I know that there are multiple sources for errors.
<niemeyer> (or codes)
<TheMue> OK, that's reasonable.
<niemeyer> Super, thanks
<TheMue> My first idea has been only a topologyError to classify what
happens during topology operations.
<niemeyer> I find it hard to think in those terms
<niemeyer> You're considering a solution, but what is the problem?
<niemeyer> We already have special error codes out of the topology
when we need them
<TheMue> Very often in situations where we call retryTopologyChange()
to seperate between an in a HOF returned logical error (I can't find
what you want with key XYZ) and an internal technical error
(YAML/GOZK).
<niemeyer> I don't understand what you mean..
<niemeyer> Very often in situations where we call (...) to separate
between (...) and an internal technical error
<niemeyer> !?
<TheMue> E.g. inside a HOF I may return "relation r-1 not found". But
I can't just append this directly to the message that the caller
returns, the key would be exposed.
<niemeyer> That's fine
<niemeyer> Because you've checked that case before hand
<TheMue> On the other hand we may have a YAML error. That should be returned.
<niemeyer> See how the Python code works
<niemeyer> if !topology.HasService(service) { return
fmt.Errorf("service %q was not found") }
<TheMue> It uses exceptions, own TYPES for each error (like Go types
or error codes and only one type).
<niemeyer> if !topology.HasService(service) { return
fmt.Errorf("service %q was not found", name) }
<niemeyer> This is inside the changeFunc of retry topology
<niemeyer> So retry topology can add the local context:
<niemeyer> err := retry...
<niemeyer> if err != nil { return fmt.Errorf("can't add relation
between %s and %s: %v", foo, bar, err) }
<niemeyer> With that, if you get an error out of a topology method, it
can be popped up as-is, because the errors we did anticipate and
handled politely were all good beforehand
<TheMue> OK, will try it this way. Then it should be more easy with a
smaller branch. Just reject it.
<niemeyer> Thanks.. I'll mail these comments to the list to have
everyone aware of these ideas


gustavo @ http://niemeyer.net



More information about the Juju-dev mailing list