Follow up - "Missing contrib.amulet in charm-helpers-hooks"
James Beedy
jamesbeedy at gmail.com
Thu Oct 29 21:29:17 UTC 2015
David,
This happened to be my hardware relocation week (moving servers from our campus to our colo),
for these servers so unfortunately I cannot get any more logs or findings from this deploy…. interestingly
enough, my shared-db-relation-changed hook errors now cease to be reproducible.
Anomalies ….
I ran a deploy in a different physical location using the same servers and same charm params used in my
previous deploys, and (repeatedly now) have not experienced the shared-db-relation-changed hook errors
for charms relating to percona-cluster.
On the other hand, I have had inconsistent results reproducing the hook errors for amqp-relation-changed
with the servers in a new physical env (<- shouldn’t matter), and the hooks stuck in executing for certain charms still exist with the full ha deploy.
In short, although unlikely, I can’t help but attribute the percona hook errors to something in my physical environment, what exactly, I’m still unsure of.
I will be standing up new hardware next week to replace the set of servers that has been relocated, and will be able to allocate a test environment
to orderly and effectively pursue getting to the bottom of what was going on here.
> James,
>
> Some responses in line:
>
> On 10/26/2015 11:38 PM, James Beedy wrote:
>> As a follow up to my earlier post "Missing contrib.amulet in charm-helpers-hooks? I want to clarify a few things, as well as hopefully get to the bottom of some hook errors I am experiencing. Earlier, I mentioned that
>> I thought I had found the cause of my hook errors..... I realize now that what I thought was causing my shared-db-relation-changed hook error is most likely entirely unrelated. I apologize for any misconception I may of had there. What definitely is happening though, is that I am experiencing shared-db-relation-changed hook errors amongst a few other inconsistencies for the charms heat, glance and neutron-api, when related to percona-cluster, and ampq-relation-changed hook errors for rabbitmq-server on ampq-relation-changed for cinder and heat as well.
>>
>> My setup is as follows:
>>
>> deployer-ha.yaml: http://paste.ubuntu.com/12977516/
>> rels.py: http://paste.ubuntu.com/12977480/
>> ha_rels.yaml: http://paste.ubuntu.com/12977481/
>> primary_rels.yaml: http://paste.ubuntu.com/12977487/
>>
>>
>> I am kicking off my deploy with the following procedure:
>>
>> 1. '$ juju-deployer -c deployer-ha.yaml --no-relations'
>> 2. '$ ./rels.py ha_rels.yaml'
>> 3. '$ ./rels.py primary_rels.yaml'
>>
>> * Once #1 finishes, and before running #2, I wait for my env to settle and look like: http://paste.ubuntu.com/12977466/
>> * After running #2, my env settles and looks like: http://paste.ubuntu.com/12977494/
>> * After running #3, my env settles and looks like: http://paste.ubuntu.com/12977593/
>>
>
> I am curious why you are separating out relations from the juju-deployer
> run. Have you had better success with that?
- I have had good juju using this tactic:-) ….proves to be especially helpful when getting vip endpoints to populate correctly
(although my mileage here has gotten better without applying the staggered relations).
- The next step I want to make in this direction is to let my env completely settle after each relation before making the next.
I feel this will help me more easily identify issues or inconsistencies happening with the deploy.
>
>> As you can see, after #3 I have a few incomplete relations, a few hook errors, and a few
>> hooks stuck in executing.
>>
>> To gain some insight here, I opened up a few new terminal windows and decided to debug the
>> hooks in question.
>>
>> The output of debugging the relation changed hook for glance/0 is as follows: http://paste.ubuntu.com/12977635/
>>
>> Notice the line 'No handlers could be found for logger "oslo_config.cfg"' is the last bit of
>> information I can get here before what looks like an sql connection failure...I assumed the
>> handlers not being available for the logger might of been blocking....this is what led
>> me on the wild goose chase of the missing import:-)
>
> The 'No handlers could be found for logger' is not the issue. That is
> related to logging and not the cause of the problems.
- Ok, great. I see this now.
> Your logs show two failures that caused all the rest:
>
>> cat /var/log/glance/api.log shows : http://paste.ubuntu.com/12977656/
>>
>> cat /var/log/juju/unit-glance-0.log shows: http://paste.ubuntu.com/12977662/
>>
>>
>> Debugging the heat shared-db-relation-changed hook I get the following output: http://paste.ubuntu.com/12977674/
>>
>> Heat logs:
>>
>> cat /var/log/heat/heat-engine.log : http://paste.ubuntu.com/12977676/
>> cat /var/log/heat/heat-api.log : http://paste.ubuntu.com/12977678/
>> cat /var/log/heat/heat-api-cfn.log : http://paste.ubuntu.com/12977682/
>> cat /var/log/juju/unit-heat-0.log : http://paste.ubuntu.com/12977694/
>>
>>
>> Debugging neutron-api shared-db-relation-changed hook I get the output : http://paste.ubuntu.com/12977718/
>>
>> cat /var/log/juju/unit-neutron-api-0.log : http://paste.ubuntu.com/12977720/
>> cat /var/log/neutron/neutron-server.log : http://paste.ubuntu.com/12977734/
>>
>
> First failure is percona-cluster. All of the above logs show problems
> connecting to mysql/percona-cluster. So the fault is with
> percona-cluster some where.
>
> Can you provide logs for the percona-cluster service? And/or do some
> debugging there?
- Ok, got it. I will get back to you on this.
>
>> Debugging rabbitmq-server:
>>
>> cat /var/log/juju/unit-rabbitmq-server-2.log: http://paste.ubuntu.com/12977743/
>>
>>
>> Hopefully this helps give a slightly better understanding of the hook errors I'm experiencing. Any insight would be greatly appreciated!
>
> Second, is rabbitmq. We have been on a crusade to resolved rabbitmq
> clustering bugs. In this case it seems rabbitmq was down or stopped when
> then charm attempted to check status. I'll take a look and see if there
> is anything obvious.
- I still seem to be experiencing some issues here…seems to only happen when stateless services are deployed in ha....
>
>> Thank you all for your time,
>>
>> James
>
> So to summarize, two core services appear to have failed:
> percona-cluster and rabbitmq-server. Since most other services rely on
> these you see the various other failures including incomplete relations
> and failed relation hook runs.
>
> If you can provide some logs for percona-cluster I'll see if there is
> anything obvious. I'll also look at rabbitmq.
>
- I will get back to you about this.
> --
> David Ames
To summarize,
a) The shared-db-relation-changed hook errors I was experiencing cease to exist.
- I will continue to explore what was going on here next week when I have a solid test env again.
b) The amqp-relation-changed hooks seem to present themselves depending on how the stack is deployed and relations are made.
- I will continue to explore this further.
c) Issues with hooks stuck in executing, and vip endpoints problems still exist.
- Different issue here, but stifling none the less…I will continue to follow up about this through a bug.
d) In the meantime, I’ll go ahead and close the bug I created concerning shared-db-relation-changed hook errors for percona-cluster until I can reproduce and further explore the issue.
Thanks again for your time,
James
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ubuntu.com/archives/juju/attachments/20151029/738aa5be/attachment.pgp>
More information about the Juju
mailing list