new dependency

Fri Feb 7 18:30:41 UTC 2014

On 7 February 2014 12:14, Nate Finch <nate.finch at canonical.com> wrote:
>
> On Fri, Feb 7, 2014 at 5:24 AM, roger peppe <rogpeppe at gmail.com> wrote:
>>
>>
>> http://play.golang.org/p/Ze_YTCcfzm
>>
>> Note that, even ignoring the doc comment, it is easy to tell by
>> looking at the high
>> level function, ReadPacketFromNetwork, what classifiable errors might
>> be returned from
>> it. This makes it easy to write the doc comment - the contract with the
>> caller is clear.
>
>
> This seems to be where we're differing.  I'd never return io.UnexpectedEOF.
> I'd return a custom error like mypackage.TruncatedData.

You seem to be saying that there's no purpose in general error
classifications. Would you have a special error type for each package
where our errors.NotFoundError is used?

Note that if you *do* want to pass your TruncatedData type back through
several layers, you *can't* annotate it, because annotating it changes
the type.

Both errgo and arrar allow you to annotate the error (producing a new
error value) independently of the underlying "diagnosis".

The difference is that errgo considers it better behaviour to reveal
special error types and values only when explicitly needed, and not by
default all the time.

That to me makes it more robust, because you don't by default allow
errors to cross abstraction boundaries.

> So then the calling code would look something like this:
>
> http://play.golang.org/p/kAWClCm3gv
>
> (that's using a testing function, you could instead let the caller do a type
> check, but that's uglier).
>
> UnexpectedEOF is an implementation detail. It's the package's job to
> translate that into something the caller can understand, and hide the
> implementation from the caller.  What if you change the code later so it
> calls something that doesn't return UnexpectedEOF?

What if it was errors.NotFoundError ?
Would you still consider that as something that needs to be translated
at every level?

> I guess I don't understand why you'd ever want a diagnosis that's different
> from just the error you're returning.

One random example (the first one I looked at, so probably not that
unusual a case) from our source code is tools.Fetch.

At least one place (tools.FindToolsForCloud) relies on it to return
a NotFoundError if the tools are not found.  There are probably more
places (and it's a public function, so it's quite possible some third
party code might be using it too).

It passes through the NotFoundError from an underlying call to
simplestreams.GetMetadata. So in that case, we would almost certainly want
to annotate the error, but we'd still want the diagnosis to be the same.

But let's try to trace the code and find out where that NotFoundError is
actually generated. I wrote down function names when I started to consider
them, and the indentation mirrors the call tree. I stopped when I found
the first NotFoundError return.

tools.Fetch
    simplestreams.GetMetadata
        simplestreams.getMaybeSignedMetadata
            GetIndexWithFormat
                fetchData
                    found it!
            IndexReference.getLatestMetadataWithFormat
                IndexReference.GetCloudMetadataWithFormat
                GetLatestMetadata

So the NotFoundError is passed up 4 levels before it's actually
acted on. In not one of those levels is it actually documented that
a NotFoundError may be returned, and it's entirely possible that one
of the other paths might return the same error for some other reason,
causing the higher level logic in FindToolsForCloud to malfunction
(or perhaps not - it's hard to know which paths are important here).

This is a really good example of the kind of fragility that is a natural
result of our current approach, and is not helped at all by arrar's
approach (it actually makes things somewhat worse, because a former
call to fmt.Errorf, which hides the underlying error, becomes a call to
arrar.Annotate, which does not).

Using errgo in this situation, we would be be explicit in all the places
where we need a NotFoundError to pass through. This would make it
very clear what paths might generate that error, even in the absence
of documentation.

Another good example was demonstrated by this fix:
https://codereview.appspot.com/54950046/

An error returned deep from within some worker code became conflated
with another similar error we were expecting to find, causing the entire
agent to be torn down rather than the worker task restarting gracefully,
and our tests to become flaky..

This was a classic abstraction-breakage failure - the error returned
from within the worker was not returned as anything special - it
wasn't part of the contract of that worker to return that kind of error.
But it became confused with another error which *was* part of a contract,
because our default behaviour is to expose rather than hide the underlying
error values.

IMO the more possible sources of error in our system, and the less that
we are forced to explicitly declare what kind of errors might pass through
layers of abstraction, the more this kind of problem is going to happen.

> You can always add on to the stack trace when an error cross goroutine
> boundaries.  I haven't really thought about it, so I'm not sure if you'd end
> up with something really illegible or not.

You end up with multiple independent stack traces, and it's very
hard to see the actual control flow. The error wrapping path
actually works very nicely for seeing the important information about
the error IMO.

Sorry, that turned out rather long. :-)

  cheers,
    rog.