Thoughts about error definition and handling

Wed Jun 6 17:54:05 UTC 2012

Hi Frank,

> Today the creation of error informations is handled differently.
> I've tried to compare the pros and cons of the error creation.

We've discussed this topic back in January (conversation attached at
the bottom of this mail for reference), and unless you have some new
information on the issue, my feeling about it continues to be the
same. There isn't a single approach that will work for all cases, and
there are natural ways to decide which path to pick for a given case.

My main concern right now, though, is that the error messages being
returned by the state package are on the poor side, with internal
details leaking and with context-less messages, rather than whether
the message comes from a string or from a type.

> Additionally right now we have different shapes of error
> messages, some have prefixes (state: and charm:), most

Prefixes should be removed and the message improved to be meaningful
in its context because, as already presented in reviews, these
prefixes do not help the end message when they start to be
concatenated together.

> not, some start with a capital letter, most not. As Gustavo
> said with good reason this has to be cleaned. I'll doing it
> for the state package.

They should all start with a lowercase letter, because often they will
be added to some other context, And: This: Looks: Bad.

> Third part is the passing of low-level errors up to higher level
> callers. I don't know if it's a problem in other packages too,
> but in state I've often passed errors to the caller "as is".

There isn't a one-answer-for-all in this case. The main point is for
us to evaluate the context of the error and figure if the underlying
error is likely to be understood.  ZooKeeper is one example where it
should be wrapped or translated entirely whenever going out of the
state package, because there's a significant conceptual translation
between modifying the ZK coordination filesystem and doing high-level
actions on the state package.

Here is a copy of the IRC conversation from January, for reference:

Jan 30 12:32:38 <rog>	TheMue: "
Jan 30 12:32:38 <rog>	"can't deploy charm: can't create service state:
can't create path:
Jan 30 12:32:39 <rog>	can't write file xyz"
Jan 30 12:32:55 <rog>	TheMue: i've found that kind of error message
very useful in the past
Jan 30 12:33:26 <rog>	TheMue: it means that the error actually
reflects what went wrong at each level. it's better than a stack trace
because it's human-readable.
Jan 30 12:33:35 <TheMue>	rog: the discussion about wrapping errors
instead of just pass it up?
Jan 30 12:33:40 <rog>	TheMue: yeah
Jan 30 12:35:02 <rog>	TheMue: otherwise in this case you get a bare
zookeeper error which has no relation to what was actually being
attempted.
Jan 30 12:37:09 <rog>	TheMue: e.g. if i've called AddService, and i
get ZNODEEXISTS in return, it would be nice to see some context on
that error.
Jan 30 12:37:52 <niemeyer>	FWIW, improving the errors may be done in a
follow up branch
Jan 30 12:38:11 <niemeyer>	This is really big and flying for a while..
I'm keen on seeing it merged if we don't have any majors
Jan 30 12:38:52 <TheMue>	rog: i do so in my private stuff, here we
decided to postpone it
Jan 30 12:39:57 <rog>	ok.
Jan 30 12:40:15 <TheMue>	niemeyer: what are your concerns against
having an own error type wrapping a returned error and adding
additional infos?
Jan 30 12:40:22 <niemeyer>	TheMue: Made a few minor comments there,
and you have a LGTM
Jan 30 12:40:43 <niemeyer>	TheMue: You don't have to wait for the
error comparison to merge.
Jan 30 12:40:44 <TheMue>	niemeyer: great, thx
Jan 30 12:41:15 <niemeyer>	TheMue: But please send me the comparison
so we can see if we'll need more tests in a follow up branch.
Jan 30 12:41:43 <TheMue>	niemeyer: yep
Jan 30 12:42:13 <niemeyer>	TheMue: Regarding the error messages, it's
not a trivial decision
Jan 30 12:42:24 <niemeyer>	TheMue: There are three possible scenarios
to keep in mind:
Jan 30 12:43:24 <niemeyer>	1) Processing the error value is important
to understand its nature, and the underlying message has good-enough
context
Jan 30 12:43:39 <niemeyer>	2) Processing the error value is important
to understand its nature, and the underlying message does NOT have
enough context
Jan 30 12:44:14 <niemeyer>	3) Processing the error value is not
important to understand its nature, and the underlying message has
enough context
Jan 30 12:44:26 <niemeyer>	4) Processing the error value is not
important to understand its nature, and the underlying message does
NOT have enough context
Jan 30 12:44:29 <niemeyer>	Sorry, that's 4
Jan 30 12:45:32 <niemeyer>	TheMue: In cases 1 and 3, the underlying
message is good enough, so just passthrough
Jan 30 12:45:56 <niemeyer>	TheMue: Cases 2 and 4 are the ones we have
to do something about
Jan 30 12:46:23 <niemeyer>	TheMue: In case 4, we don't care about the
error value itself, so we just warp it with fmt.Errorf providing
additional context..
Jan 30 12:46:42 <niemeyer>	TheMue: That's the typical:
fmt.Errorf("can't do foo: %v", err)
Jan 30 12:46:48 <niemeyer>	TheMue: Which is what rog is suggesting in this case
Jan 30 12:47:21 <niemeyer>	TheMue: Case 2 is slightly more involved,
but not much
Jan 30 12:47:46 <niemeyer>	TheMue: In that case, we want to give the
developer access to the real error value, but want to add additional
information too
Jan 30 12:48:13 <niemeyer>	TheMue: That typically ends up with something like:
Jan 30 12:48:18 <rog>	niemeyer: yeah, case 2 is where you define a new type
Jan 30 12:48:35 <niemeyer>	type MyError struct { Err error; MoreInfo string }
Jan 30 12:49:22 <niemeyer>	func (e *MyError) Error() { return
fmt.Sprintf("can't do foo (blah=%s): %v", e.MoreInfo, e.Err) }
Jan 30 12:49:32 <niemeyer>	TheMue: So we basically have the cake and eat it too
Jan 30 12:49:36 <rog>	niemeyer: anyway i'd put the case i was
referring to in category 4
Jan 30 12:49:42 <niemeyer>	rog: Agreed
Jan 30 12:49:52 <rog>	niemeyer: yeah, at the cost of some code
Jan 30 12:50:03 <niemeyer>	rog: There are tons of these in State, though
Jan 30 12:50:30 <niemeyer>	TheMue: It'd be good to do an independent
pass on the code and have just these changes in their own branch
Jan 30 12:50:44 <rog>	niemeyer: sounds good
Jan 30 12:51:38 <TheMue>	niemeyer: ok

gustavo @ http://niemeyer.net