Follow up - "Missing contrib.amulet in charm-helpers-hooks"

James Beedy jamesbeedy at gmail.com
Thu Oct 29 21:29:17 UTC 2015


David,

This happened to be my hardware relocation week (moving servers from our campus to our colo),
for these servers so unfortunately I cannot get any more logs or findings from this deploy…. interestingly
enough, my shared-db-relation-changed hook errors now cease to be reproducible.

Anomalies ….

I ran a deploy in a different physical location using the same servers and same charm params used in my
previous deploys, and (repeatedly now) have not experienced the shared-db-relation-changed hook errors
for charms relating to percona-cluster.

On the other hand, I have had inconsistent results reproducing the hook errors for amqp-relation-changed
with the servers in a new physical env (<- shouldn’t matter), and the hooks stuck in executing for certain charms still exist with the full ha deploy.

In short, although unlikely, I can’t help but attribute the percona hook errors to something in my physical environment, what exactly, I’m still unsure of.

I will be standing up new hardware next week to replace the set of servers that has been relocated, and will be able to allocate a test environment
to orderly and effectively pursue getting to the bottom of what was going on here.


> James,
> 
> Some responses in line:
> 
> On 10/26/2015 11:38 PM, James Beedy wrote:
>> As a follow up to my earlier post "Missing contrib.amulet in charm-helpers-hooks? I want to clarify a few things, as well as hopefully get to the bottom of some hook errors I am experiencing. Earlier, I mentioned that
>> I thought I had found the cause of my hook errors..... I realize now that what I thought was causing my shared-db-relation-changed hook error is most likely entirely unrelated. I apologize for any misconception I may of had there. What definitely is happening though, is that I am experiencing shared-db-relation-changed hook errors amongst a few other inconsistencies for the charms heat, glance and neutron-api, when related to percona-cluster, and ampq-relation-changed hook errors for rabbitmq-server on ampq-relation-changed for cinder and heat as well.
>> 
>>  My setup is as follows:
>> 
>> deployer-ha.yaml: http://paste.ubuntu.com/12977516/
>> rels.py: http://paste.ubuntu.com/12977480/
>> ha_rels.yaml: http://paste.ubuntu.com/12977481/
>> primary_rels.yaml: http://paste.ubuntu.com/12977487/
>> 
>> 
>> I am kicking off my deploy with the following procedure:
>> 
>> 1. '$ juju-deployer -c deployer-ha.yaml --no-relations'
>> 2. '$ ./rels.py ha_rels.yaml'
>> 3. '$ ./rels.py primary_rels.yaml'
>> 
>> * Once #1 finishes, and before running #2, I wait for my env to settle and look like: http://paste.ubuntu.com/12977466/
>> * After running #2, my env settles and looks like: http://paste.ubuntu.com/12977494/
>> * After running #3, my env settles and looks like: http://paste.ubuntu.com/12977593/
>> 
> 
> I am curious why you are separating out relations from the juju-deployer
> run. Have you had better success with that?

      - I have had good juju using this tactic:-)  ….proves to be especially helpful when getting vip endpoints to populate correctly
        (although my mileage here has gotten better without applying the staggered relations).

      - The next step I want to make in this direction is to let my env completely settle after each relation before making the next.
         I feel this will help me more easily identify issues or inconsistencies happening with the deploy.
> 
>> As you can see, after #3 I have a few incomplete relations, a few hook errors, and a few
>> hooks stuck in executing.
>> 
>> To gain some insight here, I opened up a few new terminal windows and decided to debug the
>> hooks in question.
>> 
>> The output of debugging the relation changed hook for glance/0 is as follows: http://paste.ubuntu.com/12977635/
>> 
>> Notice the line 'No handlers could be found for logger "oslo_config.cfg"' is the last bit of
>> information I can get here before what looks like an sql connection failure...I assumed the
>> handlers not being available for the logger might of been blocking....this is what led
>> me on the wild goose chase of the missing import:-)
> 
> The 'No handlers could be found for logger' is not the issue. That is
> related to logging and not the cause of the problems.

      - Ok, great. I see this now.

> Your logs show two failures that caused all the rest:
> 
>> cat /var/log/glance/api.log shows : http://paste.ubuntu.com/12977656/
>> 
>> cat /var/log/juju/unit-glance-0.log shows: http://paste.ubuntu.com/12977662/
>> 
>> 
>> Debugging the heat shared-db-relation-changed hook I get the following output: http://paste.ubuntu.com/12977674/
>> 
>> Heat logs:
>> 
>> cat /var/log/heat/heat-engine.log : http://paste.ubuntu.com/12977676/
>> cat /var/log/heat/heat-api.log : http://paste.ubuntu.com/12977678/
>> cat /var/log/heat/heat-api-cfn.log : http://paste.ubuntu.com/12977682/
>> cat /var/log/juju/unit-heat-0.log : http://paste.ubuntu.com/12977694/
>> 
>> 
>> Debugging neutron-api shared-db-relation-changed hook I get the output : http://paste.ubuntu.com/12977718/
>> 
>> cat /var/log/juju/unit-neutron-api-0.log : http://paste.ubuntu.com/12977720/
>> cat /var/log/neutron/neutron-server.log : http://paste.ubuntu.com/12977734/
>> 
> 
> First failure is percona-cluster. All of the above logs show problems
> connecting to mysql/percona-cluster. So the fault is with
> percona-cluster some where.
> 
> Can you provide logs for the percona-cluster service? And/or do some
> debugging there?

      - Ok, got it. I will get back to you on this.
> 
>> Debugging rabbitmq-server:
>> 
>> cat /var/log/juju/unit-rabbitmq-server-2.log: http://paste.ubuntu.com/12977743/
>> 
>> 
>> Hopefully this helps give a slightly better understanding of the hook errors I'm experiencing. Any insight would be greatly appreciated!
> 
> Second, is rabbitmq. We have been on a crusade to resolved rabbitmq
> clustering bugs. In this case it seems rabbitmq was down or stopped when
> then charm attempted to check status. I'll take a look and see if there
> is anything obvious.

     - I still seem to be experiencing some issues here…seems to only happen when stateless services are deployed in ha....
> 
>> Thank you all for your time,
>> 
>> James
> 
> So to summarize, two core services appear to have failed:
> percona-cluster and rabbitmq-server. Since most other services rely on
> these you see the various other failures including incomplete relations
> and failed relation hook runs.
> 
> If you can provide some logs for percona-cluster I'll see if there is
> anything obvious. I'll also look at rabbitmq.
> 
      - I will get back to you about this.

> --
> David Ames


To summarize,
a) The shared-db-relation-changed hook errors I was experiencing cease to exist.
    - I will continue to explore what was going on here next week when I have a solid test env again.
b) The amqp-relation-changed hooks seem to present themselves depending on how the stack is deployed and relations are made.
    - I will continue to explore this further.
c) Issues with hooks stuck in executing, and vip endpoints problems still exist.
    - Different issue here, but stifling none the less…I will continue to follow up about this through a bug.
d) In the meantime, I’ll go ahead and close the bug I created concerning shared-db-relation-changed hook errors for percona-cluster until I can reproduce and further explore the issue.

Thanks again for your time,

James

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ubuntu.com/archives/juju/attachments/20151029/738aa5be/attachment.pgp>


More information about the Juju mailing list