CI hates juju, maybe the feelings are mutual

Thu Apr 10 12:43:11 UTC 2014

I think without the -- before the -o the position is not relevant, that --
was added in a patch not long ago iirc as a revert of a previous fix, but
it does not seem to be doing what is expected.

On Thu, Apr 10, 2014 at 9:23 AM, John Meinel <john at arbash-meinel.com> wrote:

> Note that copying off the all-machines log seems to be broken, the lines:
>
> juju --show-log scp -e test-release-aws -- -o 'StrictHostKeyChecking no' -o 'UserKnownHostsFile /dev/null' -i /var/lib/jenkins/cloud-city/staging-juju-rsa 0:/var/log/juju/all-machines.log /var/lib/jenkins/jobs/aws-upgrade/workspace/artifacts/all-machines-test-release-aws.log
> 2014-04-10 03:40:59 INFO juju.cmd supercommand.go:297 running juju-1.18.0-precise-amd64 [gc]
> 2014-04-10 03:40:59 INFO juju api.go:238 connecting to API addresses: [ec2-54-86-4-94.compute-1.amazonaws.com:17070]
> 2014-04-10 03:40:59 INFO juju apiclient.go:114 state/api: dialing "wss://ec2-54-86-4-94.compute-1.amazonaws.com:17070/"
> 2014-04-10 03:40:59 INFO juju apiclient.go:124 state/api: connection established
> 2014-04-10 03:40:59 ERROR juju.cmd supercommand.go:300 unexpected argument "-o"; extra arguments must be last
>
>
> It looks like you have to spell it:
>
> juju --show-log scp -e test-release-aws 0:/var/log/juju/all-machines.log /var/lib/jenkins/jobs/aws-upgrade/workspace/artifacts/all-machines-test-release-aws.log -o 'StrictHostKeyChecking no' -o 'UserKnownHostsFile /dev/null' -i /var/lib/jenkins/cloud-city/staging-juju-rsa
>
> I don't know if "--" for SCP ever worked, but it appears 1.18 wants a
> different spelling.
>
> John
> =:->
>
>
>
> On Thu, Apr 10, 2014 at 3:40 PM, John Meinel <john at arbash-meinel.com>wrote:
>
>> So I did a 1.18.0 to 1.18.1 test here (which should be the r2264 failure
>> that you're seeing (I'm using r2266).
>>
>> I did see it upgrade successfully, but it took a surprisingly long amount
>> of time for all the unit agents to come up. The relevant logs (from my
>> point of view) are:
>> http://paste.ubuntu.com/7230339/
>>
>> So you can see at time 11:05:01 all of the machine agents notice there is
>> an upgrade to be performed. At time 11:05:01 machine-1 completes the
>> upgrade first and calls SetTools(1.18.1.1).
>>
>> 11:05:05 unit-mysql-0 is informed that it should upgrade (since
>> machine-1, where mysql is, has upgraded)
>> 11:05:05 machine-0 has restarted with updated tools, and that bounced the
>> other machines, so -1 and -2 also report that they are running 1.18.1.1
>> 11:05:07 unit-wordpress-0 is told that it should upgrade
>> 11:05:09 unit-wordpress-0 has upgraded and is told that it is now on the
>> correct version
>> 11:05:38 unit-mysql-0 finally has upgraded.
>>
>> I have no idea why all of the agents upgraded in <10s except unit-mysql-0
>> decided it needed 30s to do the same thing. Perhaps it was running a hook
>> when it was first told to upgrade and that prevented it from restarting at
>> an appropriate time.
>>
>> Note that the mysql charm still has a bug when run in the local provider,
>> where it wants to allocate something like 15GB of buffer space, so I had to
>> manually hack that back down to 10MB, before it would start in the first
>> place. I suppose the charm is doing something like "how much memory *could*
>> I get, which is 16GB on my machine".
>>
>> Anyway, upgrade worked, but it took an awfully long time.
>> I do see this in the unit-mysql-0 log file:
>>
>> http://paste.ubuntu.com/7230369/
>>
>> Which shows that it sees the need for an upgrade, but at roughly exactly
>> the same time it sees "upgrade needed" it also starts running the
>> relation-changed hook. Which seems to be trying to stop mysql and has the
>> lines:
>> 2014-04-10 11:05:08 DEBUG worker.uniter.jujuc server.go:104 hook context
>> id "mysql/0:config-changed:6033929919748807197"; dir
>> "/var/lib/juju/agents/unit-mysql-0/charm"
>> 2014-04-10 11:05:08 INFO juju-log Restart failed, trying again
>> 2014-04-10 11:05:08 INFO config-changed stop: Job has already been
>> stopped: mysql
>> 2014-04-10 11:05:38 INFO config-changed mysql start/running
>> 2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:483 ran
>> "config-changed" hook
>> 2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:494 committing
>> "config-changed" hook
>> 2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:509 committed
>> "config-changed" hook
>> 2014-04-10 11:05:38 DEBUG juju.worker.uniter modes.go:394
>> ModeConfigChanged exiting
>> 2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:140 unit "mysql/0"
>> shutting down: tomb: dying
>>
>> Which  makes it definitely seem like when one worker upgrades itself it
>> causes the other worker to run a relation-changed hook, which may block
>> that worker from doing its upgrade until the charm thinks things are ready.
>> And restarting mysql seems to take 30 (or takes 30s to timeout?)
>>
>> So is it possible that we just aren't waiting long enough? It definitely
>> looks like you're doing a lot of waiting in:
>>
>> http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/aws-upgrade/1074/console
>> (I see 682 lines of 1.18.1: 0). And a timeout of 5 minutes.
>>
>> I guess I'm at a loss. I did see it take a long time to upgrade, but it
>> did succeed.
>> John
>> =:->
>>
>>
>>
>> On Thu, Apr 10, 2014 at 1:41 PM, John Meinel <john at arbash-meinel.com>wrote:
>>
>>> Well you used to be able to request a downgrade, but it never actually
>>> worked... :)
>>> And with the new upgrade steps, we explicitly don't implement the 'back
>>> out these changes' logic, which is why things were breaking
>>> some-of-the-time on upgrade. I'm not sure what I broke, but it is possible
>>> I changed it from "some-of-the-time" to "all-of-the-time" by some perverse
>>> logic. I'm pretty sure I tested it locally.
>>>
>>> I did try to upgrade a WP+MySQL environment from 1.16.6 to
>>> lp:juju-core/1.18 (2266). And it failed but because of the WP charm
>>> config-changed hook. This was the error:
>>> 2014-04-10 09:37:13 INFO config-changed E: Could not get lock
>>> /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
>>> 2014-04-10 09:37:13 INFO config-changed E: Unable to lock the
>>> administration directory (/var/lib/dpkg/), is another process using it?
>>> 2014-04-10 09:37:13 ERROR juju.worker.uniter uniter.go:475 hook failed:
>>> exit status 100
>>>
>>> After doing "juju resolved --retry wordpress/0" everything was happy.
>>>
>>> I wonder if there is something in 1.18 that is causing apt commands to
>>> run after upgrade, and that ends up racing with running the config-changed
>>> hook.
>>>
>>> Are we careful to take out the FSLock when doing Apt commands from
>>> upgrade?
>>>
>>> John
>>> =:->
>>>
>>>
>>>
>>>
>>> On Thu, Apr 10, 2014 at 8:16 AM, Curtis Hovey-Canonical <
>>> curtis at canonical.com> wrote:
>>>
>>>> I am exhausted, so I am sending out the barest summary of the Juju-CI
>>>> problems I see.
>>>>
>>>> http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/
>>>> ^ In  general, if the test doesn't end in -devel or start with walk-,
>>>> we require the test to pass.
>>>>
>>>> lp:juju-core/trunk r2593 could not upgrade because of an exception.
>>>> Note that this rev precedes the rev that makes it impossible to
>>>> downgrade. I see new revisions queue, may trunk will pass while I
>>>> sleep
>>>>
>>>> lp:juju-core/1.18 r2264 and subsequent revscould not upgrade hp,
>>>> joyent, azure, or aws. some of the agents failed to update. This
>>>> results for the different CPC are consistent. I see the change relates
>>>> to not permitting downgrades. Damn, I just submitted pull request
>>>> documenting that you can downgrade.
>>>>
>>>> --
>>>> Curtis Hovey
>>>> Canonical Cloud Development and Operations
>>>> http://launchpad.net/~sinzui
>>>>
>>>> --
>>>> Juju-dev mailing list
>>>> Juju-dev at lists.ubuntu.com
>>>> Modify settings or unsubscribe at:
>>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>>
>>>
>>>
>>
>
> --
> Juju-dev mailing list
> Juju-dev at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140410/1310216c/attachment-0001.html>