Re: Multi install with existing MAAS starts all services except for “IP Pending” on Glance Simplestreams Image Sync

Thu Jul 30 19:02:53 UTC 2015

Here is commands.log, which definitely has complaints about
nova-controller-setup.sh:

http://paste.ubuntu.com/11968600/

And after running nova-controller-setup.sh again via juju as you mentioned:

http://paste.ubuntu.com/11968544/

So I guess because the compute node failed to deploy in the first
place, the installer still tried to issue the nova-controller-setup.sh
script but it failed without a compute node? Or is that not involved
in that process? And then, when I re-commissioned and deployed the
compute node, it failed to re-run the script?

Thanks,

Jeff

On Thu, Jul 30, 2015 at 1:41 PM, Mike McCracken
<mike.mccracken at canonical.com> wrote:
> Hi Jeff, the ubuntu user and roles etc are created by a script that the
> installer runs after deploying nova-cloud-controller.
> The file ~/.cloud-install/commands.log will have any errors encountered
> while trying to run that script.
> You can also look at the script that would run in
> ~/.cloud-install/nova-controller-setup.sh, and optionally try running it
> yourself - it should be present on the nova-cloud-controller unit in /tmp so
> you can do e.g.
> % juju run --unit nova-cloud-controller/0 "/tmp/nova-controller-setup.sh
> <the password you used in the installer> Single"
> to try it again.
>
> On Thu, Jul 30, 2015 at 10:14 AM, Jeff McLamb <mclamb at gmail.com> wrote:
>>
>> So it was easy enough to pick up from the single failed node. After
>> Deleting it, re-enlisting, commissioning, etc. I was presented with a
>> Ready node with a new name, etc.
>>
>> I went into openstack-status and simply added the Compute service that
>> was missing and deployed it to this new node. After a while it was up,
>> all services looked good.
>>
>> I issued a `juju machine remove 1` to remove the pending failed
>> machine from juju that was no longer in the MAAS database — it had
>> nothing running on it  obviously, so I figured it would be best to
>> remove it from juju. The new machine is machine 4.
>>
>> Now when I try to login to horizon, I get "An error occurred
>> authenticating. Please try again later.”
>>
>> The keystone logs suggest user ubuntu and various roles and projects
>> were not created, even though openstack-installer tells me to login to
>> horizon with user ubuntu and the password I gave it.
>>
>> Here are the keystone logs:
>>
>> http://paste.ubuntu.com/11967913/
>>
>> Here are the apache error logs on the openstack-dashboard container:
>>
>> http://paste.ubuntu.com/11967922/
>>
>> Any ideas here?
>>
>>
>> On Thu, Jul 30, 2015 at 12:34 PM, Jeff McLamb <mclamb at gmail.com> wrote:
>> > Just to give you an update where I am:
>> >
>> > I tried various forms still using the underlying vivid MAAS/juju
>> > deployment host, tried --edit-placement, which erred out, tried
>> > removing Glance Sync again, etc. all to no avail.
>> >
>> > Then I created a trusty VM on the MAAS host and installed the stable
>> > juju and cloud-install ppa's. The problem with the stable version of
>> > openstack-install is that it does not honor the http_proxy and
>> > https_proxy lines passed on the command-line. I can see that they do
>> > not get put into the environments.yaml file, so I ended up with the
>> > same issue there as I had originally, where it could not download the
>> > tools.
>> >
>> > So I updated the cloud-install to the experimental on the trusty juju
>> > deployment VM and used the latest version, which worked fine with
>> > http_proxy and https_proxy. I have played around with trying to deploy
>> > both juno and kilo as well.
>> >
>> > My latest attempt on trusty, deploying juno, has left one physical
>> > node in a Failed Deployment state, which seems to have been caused
>> > because it keeps saying the BMC is busy, so it can't control power. I
>> > tried releasing it, which failed, so I ultimately had to Delete it and
>> > re-enlist, re-commission.
>> >
>> > Now I am at a point where the machine is back to Ready and the
>> > openstack-install is still waiting on 1 last machine (the other 2
>> > deployed just fine)... When something like this happens, is it
>> > possible to re-deploy the last remaining host, or must I start over
>> > deploying all machines again?
>> >
>> > Thanks,
>> >
>> > Jeff
>> >
>> >
>> > On Thu, Jul 30, 2015 at 1:21 AM, Mike McCracken
>> > <mike.mccracken at canonical.com> wrote:
>> >>
>> >>
>> >> On Wed, Jul 29, 2015 at 5:30 PM, Jeff McLamb <mclamb at gmail.com> wrote:
>> >>>
>> >>> OK a quick look at the neutron-api/0 /var/log/neutron just shows the
>> >>> neutron-server.log as before… but since I stepped away in the past
>> >>> hour it’s now at 800MB and counting! ;)
>> >>>
>> >>> I will play around with the relations a bit just to learn what’s going
>> >>> on, but then I will take your advice and try various alternatives with
>> >>> —edit-placement first, then finally just changing the underlying MAAS
>> >>> deployment server to trusty and see where it takes me.
>> >>
>> >>
>> >> Sounds good
>> >>
>> >>>
>> >>> Could also try
>> >>> to install without —upstream-ppa which I imagine will install juno
>> >>> instead of kilo?
>> >>
>> >>
>> >> oh, --upstream-ppa doesn't do anything for the MAAS install path, it's
>> >> only
>> >> applicable to the containerized single install.
>> >> It's harmless, though. On the single install, it's used to specify that
>> >> version of the "openstack" package (which contains openstack-install)
>> >> that
>> >> will be installed on the container to run the second half of the
>> >> process
>> >> should come from our experimental PPA. It could use some better
>> >> docs/usage
>> >> string.
>> >>
>> >> If you're interested in trying out other openstack release versions,
>> >> you
>> >> want to look at --openstack-release.
>> >>
>> >> -mike
>> >>
>> >>>
>> >>> Will keep you posted and continued thanks for all the help.
>> >>>
>> >>> Jeff
>> >>>
>> >>> On Wed, Jul 29, 2015 at 7:08 PM, Mike McCracken
>> >>> <mike.mccracken at canonical.com> wrote:
>> >>> > Jeff, based on the other logs you sent me, e.g.
>> >>> > neutron-metadata-agent.log,
>> >>> > it was pointed out to me that it's trying to connect to rabbitMQ on
>> >>> > localhost, which is wrong.
>> >>> > So something is failing to complete the juju relations.
>> >>> > My hypothesis is that the failing vivid-series charm is messing up
>> >>> > juju's
>> >>> > relations.
>> >>> > If you want to dig further, you can start looking at the relations
>> >>> > using
>> >>> > e.g. 'juju run --unit 'relation-get amqp:rabbitmq' ' (might just be
>> >>> > 'amqp')
>> >>> >
>> >>> > Or if you'd like to try just redeploying without the sync charm
>> >>> > using
>> >>> > --edit-placement, that might get a healthy cluster going, just one
>> >>> > without
>> >>> > glance images.
>> >>> > Then you could pretty easily deploy the charm manually, or just do
>> >>> > without
>> >>> > it and upload images you get from cloud-images.ubuntu.com manually .
>> >>> >
>> >>> > Sorry this is not as simple as it should be, yet :)
>> >>> > -mike
>> >>> >
>> >>> > On Wed, Jul 29, 2015 at 4:00 PM, Mike McCracken
>> >>> > <mike.mccracken at canonical.com> wrote:
>> >>> >>
>> >>> >> ok, so I just learned that the neutron-manage log should be in the
>> >>> >> neutron-api unit, so can you 'juju ssh neutron-api/0' and look in
>> >>> >> /var/log/neutron there?
>> >>> >>
>> >>> >> On Wed, Jul 29, 2015 at 3:34 PM, Jeff McLamb <mclamb at gmail.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> The neutron-server.log that is 500MB+ and growing is nonstop
>> >>> >>> repeated
>> >>> >>> output of the following, due to a database table that does not
>> >>> >>> exist:
>> >>> >>>
>> >>> >>> http://paste.ubuntu.com/11962679/
>> >>> >>>
>> >>> >>> On Wed, Jul 29, 2015 at 6:30 PM, Jeff McLamb <mclamb at gmail.com>
>> >>> >>> wrote:
>> >>> >>> > Hey Mike -
>> >>> >>> >
>> >>> >>> > OK so here is the juju status output. The quantum-gateway
>> >>> >>> > doesn’t
>> >>> >>> > look
>> >>> >>> > too strange, but I am new. The exposed status is false, but so
>> >>> >>> > it is
>> >>> >>> > for all services, and I can definitely access, say, the
>> >>> >>> > dashboard,
>> >>> >>> > even though it is not “exposed”. One thing of note is the
>> >>> >>> > public-address lines that sometimes use the domain names, e.g.
>> >>> >>> > downright-feet.maas in this case, whereas some services use IP
>> >>> >>> > addresses. I have noticed that I cannot resolve the maas names
>> >>> >>> > from
>> >>> >>> > the MAAS server (because I use the ISP’s DNS servers) but I can
>> >>> >>> > resolve them from the deployed nodes.  Here is the output:
>> >>> >>> >
>> >>> >>> > http://paste.ubuntu.com/11962631/
>> >>> >>> >
>> >>> >>> > Here is the quantum gateway replay:
>> >>> >>> >
>> >>> >>> > http://paste.ubuntu.com/11962644/
>> >>> >>> >
>> >>> >>> > Where are the neutron-manage logs? I see lots of neutron stuff
>> >>> >>> > on
>> >>> >>> > various containers and nodes — the neutron-server.log is what I
>> >>> >>> > pasted
>> >>> >>> > before and it is 500+MB and growing across a few nodes, but I
>> >>> >>> > can’t
>> >>> >>> > seem to fine neutron-manage.
>> >>> >>> >
>> >>> >>> > Thanks!
>> >>> >>> >
>> >>> >>> > Jeff
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > On Wed, Jul 29, 2015 at 5:26 PM, Mike McCracken
>> >>> >>> > <mike.mccracken at canonical.com> wrote:
>> >>> >>> >> Hi Jeff, I asked internally and was asked if you could share
>> >>> >>> >> the
>> >>> >>> >> juju
>> >>> >>> >> charm
>> >>> >>> >> logs from quantum-gateway and the neutron-manage logs in
>> >>> >>> >> /var/log/neutron.
>> >>> >>> >>
>> >>> >>> >> the charm log can be replayed by using 'juju debug-log -i
>> >>> >>> >> quantum-gateway/0
>> >>> >>> >> --replay'
>> >>> >>> >>
>> >>> >>> >> On Wed, Jul 29, 2015 at 2:03 PM, Mike McCracken
>> >>> >>> >> <mike.mccracken at canonical.com> wrote:
>> >>> >>> >>>
>> >>> >>> >>> Sorry this is so frustrating.
>> >>> >>> >>> Can you check 'juju status' for this environment and see if it
>> >>> >>> >>> says
>> >>> >>> >>> anything useful about the quantum-gateway service (aka
>> >>> >>> >>> neutron,
>> >>> >>> >>> the
>> >>> >>> >>> juju
>> >>> >>> >>> service name will be updated soon).
>> >>> >>> >>>
>> >>> >>> >>> -mike
>> >>> >>> >>>
>> >>> >>> >>> On Wed, Jul 29, 2015 at 1:15 PM, Jeff McLamb
>> >>> >>> >>> <mclamb at gmail.com>
>> >>> >>> >>> wrote:
>> >>> >>> >>>>
>> >>> >>> >>>> OK, making progress now. Per your recommendation I removed
>> >>> >>> >>>> and
>> >>> >>> >>>> added
>> >>> >>> >>>> back in the trusty sync charm manually.
>> >>> >>> >>>>
>> >>> >>> >>>> Now, I can log in to the horizon dashboard!
>> >>> >>> >>>>
>> >>> >>> >>>> However, several tabs result in a generic OpenStack (not
>> >>> >>> >>>> Ubuntu-customized like the general dashboard pages)
>> >>> >>> >>>> "Something
>> >>> >>> >>>> went
>> >>> >>> >>>> wrong! An unexpected error has occurred. Try refreshing the
>> >>> >>> >>>> page..."
>> >>> >>> >>>>
>> >>> >>> >>>> The tabs in question that give those results are Compute ->
>> >>> >>> >>>> Access &
>> >>> >>> >>>> Security, Network -> Network Topology,
>> >>> >>> >>>>
>> >>> >>> >>>> When I go to pages like Network -> Routers, it does render,
>> >>> >>> >>>> but
>> >>> >>> >>>> there
>> >>> >>> >>>> are error popup boxes in the page itself with:
>> >>> >>> >>>>
>> >>> >>> >>>> Error: Unable to retrieve router list.
>> >>> >>> >>>>
>> >>> >>> >>>> and
>> >>> >>> >>>>
>> >>> >>> >>>> Error: Unable to retrieve a list of external networks
>> >>> >>> >>>> "Connection
>> >>> >>> >>>> to
>> >>> >>> >>>> neutron failed: HTTPConnectionPool(host='192.168.1.45',
>> >>> >>> >>>> port=9696):
>> >>> >>> >>>> Max retries exceeded with url:
>> >>> >>> >>>> /v2.0/networks.json?router%3Aexternal=True (Caused by <class
>> >>> >>> >>>> 'httplib.BadStatusLine'>: '')”.
>> >>> >>> >>>>
>> >>> >>> >>>> If I do a `juju ssh openstack-dashboard/0` and tail -f
>> >>> >>> >>>> /var/log/apache2/error.log I get the following when accessing
>> >>> >>> >>>> one
>> >>> >>> >>>> of
>> >>> >>> >>>> the failed pages:
>> >>> >>> >>>>
>> >>> >>> >>>> http://paste.ubuntu.com/11961863/
>> >>> >>> >>>>
>> >>> >>> >>>> Furthermore, looking at the neutron server logs, I see
>> >>> >>> >>>> non-stop
>> >>> >>> >>>> traces
>> >>> >>> >>>> about the neutron.ml2_gre_allocations table not existing:
>> >>> >>> >>>>
>> >>> >>> >>>> http://paste.ubuntu.com/11961891/
>> >>> >>> >>>>
>> >>> >>> >>>> Getting closer, bit by bit.
>> >>> >>> >>>>
>> >>> >>> >>>> Thanks for all the help,
>> >>> >>> >>>>
>> >>> >>> >>>> Jeff
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >
>> >>
>> >>
>
>