regression: restore-backup broken by recent commit

Tim Penhey tim.penhey at
Fri Feb 24 04:27:34 UTC 2017

OK, I think I got it now...

This is all crazy, and it was a change due to the  gorilla/websocket change.

So... what happens when there is a successful restore on the server side 
is that it calls os.Exit(...) which then has the pid 1 restart the 
agent. However from the API client, this is an abnormal closure.

In the rpc layer, I capture a number of websocket close errors as 
"normal", but I missed the Abnormal closure case, which is 1006.

I'll update, and repropose to devel.



On 24/02/17 16:17, Tim Penhey wrote:
> Hi Curtis (also expanding to juju-dev),
> I have been looking into this issue. And the good news is that it
> doesn't appear to be a real problem with gorilla/websocket at all, but
> instead a change in timing showed an existing issue that hadn't surfaced
> before.
> I'll be looking into that issue - where the restore command after
> bootstrapping, doesn't appear to retry if it gets an error like "denied:
> upgrade in progress".
> Secondly I tried to reproduce on lxd to find that there is an issue with
> the rebootstrap and lxd - it just doesn't work.
> Then I tried with AWS, to mirror the CI test as close as possible. I
> didn't hit the same timing issue as before, but instead got a different
> failure with the mongo restore:
> I have no idea why juju.txns.stash failed but juju.txns and
> juju.txns.logs succeeded.
> Also, a CI run of a develop revision just before the gorilla/websocket
> reversion hit this:
>     cannot create collection "txns": unauthorized mongo access: not
>     authorized on juju to execute command { create: "txns" }
>     (unauthorized access)
> Not sure why that is happening either. Seems that the restore of mongo
> is incredibly fragile.
> Again, this shows errors in the restore code, but luckily it has nothing
> to do with gorilla/websockets.
> Tim
> On 23/02/17 04:02, Curtis Hovey-Canonical wrote:
>> Hi Tim, et al.
>> All the restore-backup tests in all the substrates failed with your
>> recent gorilla socket commit. The restore-backup command is often
>> fails when bootstrap or connection behaviours change. This new bug is
>> definitely a connection failure while the client is driving a
>> restore.
>> We need the develop branch fixed. As the previous commit was blessed,
>> as are certain 2.2-alpha1 was in very good shape before the gorilla
>> change.
>> Restore backup failed websocket: close 1006
>> As seen at
>> All the restore-backup tests failed when testing commit
>> We see
>> Initial model "default" added.
>> 04:54:39 INFO juju.juju api.go:72 connecting to API addresses:
>> []
>> 04:54:39 INFO juju.api apiclient.go:569 connection established to
>> "wss://"
>> ...
>> 04:55:20 ERROR juju.api.backups restore.go:136 could not clean up
>> after failed restore attempt: <nil>
>> 04:55:20 ERROR cmd supercommand.go:458 cannot perform restore: <nil>:
>> codec.ReadHeader error: error receiving message: websocket: close 1006
>> (abnormal closure): unexpected EOF
>> This is seen in aws, prodstack, gce

More information about the Juju-dev mailing list