regression: restore-backup broken by recent commit

John Meinel john at arbash-meinel.com
Fri Feb 24 09:23:15 UTC 2017


Is it possible for us to close more gracefully most of the time?

John
=:->

On Feb 24, 2017 08:27, "Tim Penhey" <tim.penhey at canonical.com> wrote:

> OK, I think I got it now...
>
> This is all crazy, and it was a change due to the  gorilla/websocket
> change.
>
> So... what happens when there is a successful restore on the server side
> is that it calls os.Exit(...) which then has the pid 1 restart the agent.
> However from the API client, this is an abnormal closure.
>
> In the rpc layer, I capture a number of websocket close errors as
> "normal", but I missed the Abnormal closure case, which is 1006.
>
> I'll update, and repropose to devel.
>
> Hazaah.
>
> Tim
>
>
>
> On 24/02/17 16:17, Tim Penhey wrote:
>
>> Hi Curtis (also expanding to juju-dev),
>>
>> I have been looking into this issue. And the good news is that it
>> doesn't appear to be a real problem with gorilla/websocket at all, but
>> instead a change in timing showed an existing issue that hadn't surfaced
>> before.
>>
>> I'll be looking into that issue - where the restore command after
>> bootstrapping, doesn't appear to retry if it gets an error like "denied:
>> upgrade in progress".
>>
>> Secondly I tried to reproduce on lxd to find that there is an issue with
>> the rebootstrap and lxd - it just doesn't work.
>>
>> Then I tried with AWS, to mirror the CI test as close as possible. I
>> didn't hit the same timing issue as before, but instead got a different
>> failure with the mongo restore:
>>
>>   http://pastebin.ubuntu.com/24056766/
>>
>> I have no idea why juju.txns.stash failed but juju.txns and
>> juju.txns.logs succeeded.
>>
>> Also, a CI run of a develop revision just before the gorilla/websocket
>> reversion hit this:
>>
>> http://reports.vapour.ws/releases/4922/job/functional-ha-
>> backup-restore/attempt/5045#highlight
>>
>>
>>     cannot create collection "txns": unauthorized mongo access: not
>>     authorized on juju to execute command { create: "txns" }
>>     (unauthorized access)
>>
>> Not sure why that is happening either. Seems that the restore of mongo
>> is incredibly fragile.
>>
>> Again, this shows errors in the restore code, but luckily it has nothing
>> to do with gorilla/websockets.
>>
>> Tim
>>
>> On 23/02/17 04:02, Curtis Hovey-Canonical wrote:
>>
>>> Hi Tim, et al.
>>>
>>> All the restore-backup tests in all the substrates failed with your
>>> recent gorilla socket commit. The restore-backup command is often
>>> fails when bootstrap or connection behaviours change. This new bug is
>>> definitely a connection failure while the client is driving a
>>> restore.
>>>
>>> We need the develop branch fixed. As the previous commit was blessed,
>>> as are certain 2.2-alpha1 was in very good shape before the gorilla
>>> change.
>>>
>>> Restore backup failed websocket: close 1006
>>> https://bugs.launchpad.net/juju/+bug/1666898
>>>
>>> As seen at
>>>     http://reports.vapour.ws/releases/issue/5550dda7749a561097cf3d44
>>>
>>> All the restore-backup tests failed when testing commit
>>> https://github.com/juju/juju/commit/f06c3e96f4e438dc24a28d8e
>>> bf7d22c76fff47e2
>>>
>>>
>>> We see
>>> Initial model "default" added.
>>> 04:54:39 INFO juju.juju api.go:72 connecting to API addresses:
>>> [52.201.105.25:17070 172.31.15.167:17070]
>>> 04:54:39 INFO juju.api apiclient.go:569 connection established to
>>> "wss://52.201.105.25:17070/model/89bcc17c-9af9-4113-8417-718
>>> 47838f61a/api"
>>>
>>> ...
>>> 04:55:20 ERROR juju.api.backups restore.go:136 could not clean up
>>> after failed restore attempt: <nil>
>>> 04:55:20 ERROR cmd supercommand.go:458 cannot perform restore: <nil>:
>>> codec.ReadHeader error: error receiving message: websocket: close 1006
>>> (abnormal closure): unexpected EOF
>>>
>>> This is seen in aws, prodstack, gce
>>>
>>>
>>>
>>>
>>
> --
> Juju-dev mailing list
> Juju-dev at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
> an/listinfo/juju-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20170224/7d65b416/attachment.html>


More information about the Juju-dev mailing list