regression: restore-backup broken by recent commit

Fri Feb 24 03:17:16 UTC 2017

Hi Curtis (also expanding to juju-dev),

I have been looking into this issue. And the good news is that it 
doesn't appear to be a real problem with gorilla/websocket at all, but 
instead a change in timing showed an existing issue that hadn't surfaced 
before.

I'll be looking into that issue - where the restore command after 
bootstrapping, doesn't appear to retry if it gets an error like "denied: 
upgrade in progress".

Secondly I tried to reproduce on lxd to find that there is an issue with 
the rebootstrap and lxd - it just doesn't work.

Then I tried with AWS, to mirror the CI test as close as possible. I 
didn't hit the same timing issue as before, but instead got a different 
failure with the mongo restore:

   http://pastebin.ubuntu.com/24056766/

I have no idea why juju.txns.stash failed but juju.txns and 
juju.txns.logs succeeded.

Also, a CI run of a develop revision just before the gorilla/websocket 
reversion hit this:

http://reports.vapour.ws/releases/4922/job/functional-ha-backup-restore/attempt/5045#highlight

     cannot create collection "txns": unauthorized mongo access: not
     authorized on juju to execute command { create: "txns" }
     (unauthorized access)

Not sure why that is happening either. Seems that the restore of mongo 
is incredibly fragile.

Again, this shows errors in the restore code, but luckily it has nothing 
to do with gorilla/websockets.

Tim

On 23/02/17 04:02, Curtis Hovey-Canonical wrote:
> Hi Tim, et al.
>
> All the restore-backup tests in all the substrates failed with your
> recent gorilla socket commit. The restore-backup command is often
> fails when bootstrap or connection behaviours change. This new bug is
> definitely a connection failure while the client is driving a
> restore.
>
> We need the develop branch fixed. As the previous commit was blessed,
> as are certain 2.2-alpha1 was in very good shape before the gorilla
> change.
>
> Restore backup failed websocket: close 1006
> https://bugs.launchpad.net/juju/+bug/1666898
>
> As seen at
>     http://reports.vapour.ws/releases/issue/5550dda7749a561097cf3d44
>
> All the restore-backup tests failed when testing commit
> https://github.com/juju/juju/commit/f06c3e96f4e438dc24a28d8ebf7d22c76fff47e2
>
> We see
> Initial model "default" added.
> 04:54:39 INFO juju.juju api.go:72 connecting to API addresses:
> [52.201.105.25:17070 172.31.15.167:17070]
> 04:54:39 INFO juju.api apiclient.go:569 connection established to
> "wss://52.201.105.25:17070/model/89bcc17c-9af9-4113-8417-71847838f61a/api"
> ...
> 04:55:20 ERROR juju.api.backups restore.go:136 could not clean up
> after failed restore attempt: <nil>
> 04:55:20 ERROR cmd supercommand.go:458 cannot perform restore: <nil>:
> codec.ReadHeader error: error receiving message: websocket: close 1006
> (abnormal closure): unexpected EOF
>
> This is seen in aws, prodstack, gce
>
>
>