CI hates juju, maybe the feelings are mutual

John Meinel john at arbash-meinel.com
Thu Apr 10 11:40:17 UTC 2014


So I did a 1.18.0 to 1.18.1 test here (which should be the r2264 failure
that you're seeing (I'm using r2266).

I did see it upgrade successfully, but it took a surprisingly long amount
of time for all the unit agents to come up. The relevant logs (from my
point of view) are:
http://paste.ubuntu.com/7230339/

So you can see at time 11:05:01 all of the machine agents notice there is
an upgrade to be performed. At time 11:05:01 machine-1 completes the
upgrade first and calls SetTools(1.18.1.1).

11:05:05 unit-mysql-0 is informed that it should upgrade (since machine-1,
where mysql is, has upgraded)
11:05:05 machine-0 has restarted with updated tools, and that bounced the
other machines, so -1 and -2 also report that they are running 1.18.1.1
11:05:07 unit-wordpress-0 is told that it should upgrade
11:05:09 unit-wordpress-0 has upgraded and is told that it is now on the
correct version
11:05:38 unit-mysql-0 finally has upgraded.

I have no idea why all of the agents upgraded in <10s except unit-mysql-0
decided it needed 30s to do the same thing. Perhaps it was running a hook
when it was first told to upgrade and that prevented it from restarting at
an appropriate time.

Note that the mysql charm still has a bug when run in the local provider,
where it wants to allocate something like 15GB of buffer space, so I had to
manually hack that back down to 10MB, before it would start in the first
place. I suppose the charm is doing something like "how much memory *could*
I get, which is 16GB on my machine".

Anyway, upgrade worked, but it took an awfully long time.
I do see this in the unit-mysql-0 log file:

http://paste.ubuntu.com/7230369/

Which shows that it sees the need for an upgrade, but at roughly exactly
the same time it sees "upgrade needed" it also starts running the
relation-changed hook. Which seems to be trying to stop mysql and has the
lines:
2014-04-10 11:05:08 DEBUG worker.uniter.jujuc server.go:104 hook context id
"mysql/0:config-changed:6033929919748807197"; dir
"/var/lib/juju/agents/unit-mysql-0/charm"
2014-04-10 11:05:08 INFO juju-log Restart failed, trying again
2014-04-10 11:05:08 INFO config-changed stop: Job has already been stopped:
mysql
2014-04-10 11:05:38 INFO config-changed mysql start/running
2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:483 ran
"config-changed" hook
2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:494 committing
"config-changed" hook
2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:509 committed
"config-changed" hook
2014-04-10 11:05:38 DEBUG juju.worker.uniter modes.go:394 ModeConfigChanged
exiting
2014-04-10 11:05:38 INFO juju.worker.uniter uniter.go:140 unit "mysql/0"
shutting down: tomb: dying

Which  makes it definitely seem like when one worker upgrades itself it
causes the other worker to run a relation-changed hook, which may block
that worker from doing its upgrade until the charm thinks things are ready.
And restarting mysql seems to take 30 (or takes 30s to timeout?)

So is it possible that we just aren't waiting long enough? It definitely
looks like you're doing a lot of waiting in:
http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/aws-upgrade/1074/console
(I see 682 lines of 1.18.1: 0). And a timeout of 5 minutes.

I guess I'm at a loss. I did see it take a long time to upgrade, but it did
succeed.
John
=:->



On Thu, Apr 10, 2014 at 1:41 PM, John Meinel <john at arbash-meinel.com> wrote:

> Well you used to be able to request a downgrade, but it never actually
> worked... :)
> And with the new upgrade steps, we explicitly don't implement the 'back
> out these changes' logic, which is why things were breaking
> some-of-the-time on upgrade. I'm not sure what I broke, but it is possible
> I changed it from "some-of-the-time" to "all-of-the-time" by some perverse
> logic. I'm pretty sure I tested it locally.
>
> I did try to upgrade a WP+MySQL environment from 1.16.6 to
> lp:juju-core/1.18 (2266). And it failed but because of the WP charm
> config-changed hook. This was the error:
> 2014-04-10 09:37:13 INFO config-changed E: Could not get lock
> /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
> 2014-04-10 09:37:13 INFO config-changed E: Unable to lock the
> administration directory (/var/lib/dpkg/), is another process using it?
> 2014-04-10 09:37:13 ERROR juju.worker.uniter uniter.go:475 hook failed:
> exit status 100
>
> After doing "juju resolved --retry wordpress/0" everything was happy.
>
> I wonder if there is something in 1.18 that is causing apt commands to run
> after upgrade, and that ends up racing with running the config-changed hook.
>
> Are we careful to take out the FSLock when doing Apt commands from upgrade?
>
> John
> =:->
>
>
>
>
> On Thu, Apr 10, 2014 at 8:16 AM, Curtis Hovey-Canonical <
> curtis at canonical.com> wrote:
>
>> I am exhausted, so I am sending out the barest summary of the Juju-CI
>> problems I see.
>>
>> http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/
>> ^ In  general, if the test doesn't end in -devel or start with walk-,
>> we require the test to pass.
>>
>> lp:juju-core/trunk r2593 could not upgrade because of an exception.
>> Note that this rev precedes the rev that makes it impossible to
>> downgrade. I see new revisions queue, may trunk will pass while I
>> sleep
>>
>> lp:juju-core/1.18 r2264 and subsequent revscould not upgrade hp,
>> joyent, azure, or aws. some of the agents failed to update. This
>> results for the different CPC are consistent. I see the change relates
>> to not permitting downgrades. Damn, I just submitted pull request
>> documenting that you can downgrade.
>>
>> --
>> Curtis Hovey
>> Canonical Cloud Development and Operations
>> http://launchpad.net/~sinzui
>>
>> --
>> Juju-dev mailing list
>> Juju-dev at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140410/d1aee20a/attachment-0001.html>


More information about the Juju-dev mailing list