Re: Multi install with existing MAAS starts all services except for “IP Pending” on Glance Simplestreams Image Sync

Mike McCracken mike.mccracken at canonical.com
Thu Jul 30 21:01:04 UTC 2015


It definitely looks like the initial failed compute node deployment caused
some problems.

It looks like the script was being run repeatedly and failing on the
following command:

keystone user-role-add --user ubuntu --role Member --tenant ubuntu

>> No role with a name or ID of 'Member' exists.

which is the same thing that happened when you tried it again just now.

Then you apparently killed the install and tried again, at which point
the log is flooded with errors relating to it not finding the machine
ID that it recorded in the placement. It's pretty clear that it
doesn't deal well with machines where you placed a service leaving
MAAS afterward.

The setup script doesn't run again because after restarting, the
nova-cloud-controller service is marked as having been deployed, even
though the script never actually completed successfully.

Off the top of my head I don't know what might be going on with keystone, I
thought the Member role was created by default.
Maybe the keystone unit's debug log has a clue, but at this point I'd be
tempted to just try again and avoid the broken machine.

I'm sorry this has been such an ordeal, thanks for testing things out!
-mike

On Thu, Jul 30, 2015 at 12:02 PM, Jeff McLamb <mclamb at gmail.com> wrote:

> Here is commands.log, which definitely has complaints about
> nova-controller-setup.sh:
>
> http://paste.ubuntu.com/11968600/
>
> And after running nova-controller-setup.sh again via juju as you mentioned:
>
> http://paste.ubuntu.com/11968544/
>
>
> So I guess because the compute node failed to deploy in the first
> place, the installer still tried to issue the nova-controller-setup.sh
> script but it failed without a compute node? Or is that not involved
> in that process? And then, when I re-commissioned and deployed the
> compute node, it failed to re-run the script?
>
> Thanks,
>
> Jeff
>
>
> On Thu, Jul 30, 2015 at 1:41 PM, Mike McCracken
> <mike.mccracken at canonical.com> wrote:
> > Hi Jeff, the ubuntu user and roles etc are created by a script that the
> > installer runs after deploying nova-cloud-controller.
> > The file ~/.cloud-install/commands.log will have any errors encountered
> > while trying to run that script.
> > You can also look at the script that would run in
> > ~/.cloud-install/nova-controller-setup.sh, and optionally try running it
> > yourself - it should be present on the nova-cloud-controller unit in
> /tmp so
> > you can do e.g.
> > % juju run --unit nova-cloud-controller/0 "/tmp/nova-controller-setup.sh
> > <the password you used in the installer> Single"
> > to try it again.
> >
> > On Thu, Jul 30, 2015 at 10:14 AM, Jeff McLamb <mclamb at gmail.com> wrote:
> >>
> >> So it was easy enough to pick up from the single failed node. After
> >> Deleting it, re-enlisting, commissioning, etc. I was presented with a
> >> Ready node with a new name, etc.
> >>
> >> I went into openstack-status and simply added the Compute service that
> >> was missing and deployed it to this new node. After a while it was up,
> >> all services looked good.
> >>
> >> I issued a `juju machine remove 1` to remove the pending failed
> >> machine from juju that was no longer in the MAAS database — it had
> >> nothing running on it  obviously, so I figured it would be best to
> >> remove it from juju. The new machine is machine 4.
> >>
> >> Now when I try to login to horizon, I get "An error occurred
> >> authenticating. Please try again later.”
> >>
> >> The keystone logs suggest user ubuntu and various roles and projects
> >> were not created, even though openstack-installer tells me to login to
> >> horizon with user ubuntu and the password I gave it.
> >>
> >> Here are the keystone logs:
> >>
> >> http://paste.ubuntu.com/11967913/
> >>
> >> Here are the apache error logs on the openstack-dashboard container:
> >>
> >> http://paste.ubuntu.com/11967922/
> >>
> >> Any ideas here?
> >>
> >>
> >> On Thu, Jul 30, 2015 at 12:34 PM, Jeff McLamb <mclamb at gmail.com> wrote:
> >> > Just to give you an update where I am:
> >> >
> >> > I tried various forms still using the underlying vivid MAAS/juju
> >> > deployment host, tried --edit-placement, which erred out, tried
> >> > removing Glance Sync again, etc. all to no avail.
> >> >
> >> > Then I created a trusty VM on the MAAS host and installed the stable
> >> > juju and cloud-install ppa's. The problem with the stable version of
> >> > openstack-install is that it does not honor the http_proxy and
> >> > https_proxy lines passed on the command-line. I can see that they do
> >> > not get put into the environments.yaml file, so I ended up with the
> >> > same issue there as I had originally, where it could not download the
> >> > tools.
> >> >
> >> > So I updated the cloud-install to the experimental on the trusty juju
> >> > deployment VM and used the latest version, which worked fine with
> >> > http_proxy and https_proxy. I have played around with trying to deploy
> >> > both juno and kilo as well.
> >> >
> >> > My latest attempt on trusty, deploying juno, has left one physical
> >> > node in a Failed Deployment state, which seems to have been caused
> >> > because it keeps saying the BMC is busy, so it can't control power. I
> >> > tried releasing it, which failed, so I ultimately had to Delete it and
> >> > re-enlist, re-commission.
> >> >
> >> > Now I am at a point where the machine is back to Ready and the
> >> > openstack-install is still waiting on 1 last machine (the other 2
> >> > deployed just fine)... When something like this happens, is it
> >> > possible to re-deploy the last remaining host, or must I start over
> >> > deploying all machines again?
> >> >
> >> > Thanks,
> >> >
> >> > Jeff
> >> >
> >> >
> >> > On Thu, Jul 30, 2015 at 1:21 AM, Mike McCracken
> >> > <mike.mccracken at canonical.com> wrote:
> >> >>
> >> >>
> >> >> On Wed, Jul 29, 2015 at 5:30 PM, Jeff McLamb <mclamb at gmail.com>
> wrote:
> >> >>>
> >> >>> OK a quick look at the neutron-api/0 /var/log/neutron just shows the
> >> >>> neutron-server.log as before… but since I stepped away in the past
> >> >>> hour it’s now at 800MB and counting! ;)
> >> >>>
> >> >>> I will play around with the relations a bit just to learn what’s
> going
> >> >>> on, but then I will take your advice and try various alternatives
> with
> >> >>> —edit-placement first, then finally just changing the underlying
> MAAS
> >> >>> deployment server to trusty and see where it takes me.
> >> >>
> >> >>
> >> >> Sounds good
> >> >>
> >> >>>
> >> >>> Could also try
> >> >>> to install without —upstream-ppa which I imagine will install juno
> >> >>> instead of kilo?
> >> >>
> >> >>
> >> >> oh, --upstream-ppa doesn't do anything for the MAAS install path,
> it's
> >> >> only
> >> >> applicable to the containerized single install.
> >> >> It's harmless, though. On the single install, it's used to specify
> that
> >> >> version of the "openstack" package (which contains openstack-install)
> >> >> that
> >> >> will be installed on the container to run the second half of the
> >> >> process
> >> >> should come from our experimental PPA. It could use some better
> >> >> docs/usage
> >> >> string.
> >> >>
> >> >> If you're interested in trying out other openstack release versions,
> >> >> you
> >> >> want to look at --openstack-release.
> >> >>
> >> >> -mike
> >> >>
> >> >>>
> >> >>> Will keep you posted and continued thanks for all the help.
> >> >>>
> >> >>> Jeff
> >> >>>
> >> >>> On Wed, Jul 29, 2015 at 7:08 PM, Mike McCracken
> >> >>> <mike.mccracken at canonical.com> wrote:
> >> >>> > Jeff, based on the other logs you sent me, e.g.
> >> >>> > neutron-metadata-agent.log,
> >> >>> > it was pointed out to me that it's trying to connect to rabbitMQ
> on
> >> >>> > localhost, which is wrong.
> >> >>> > So something is failing to complete the juju relations.
> >> >>> > My hypothesis is that the failing vivid-series charm is messing up
> >> >>> > juju's
> >> >>> > relations.
> >> >>> > If you want to dig further, you can start looking at the relations
> >> >>> > using
> >> >>> > e.g. 'juju run --unit 'relation-get amqp:rabbitmq' ' (might just
> be
> >> >>> > 'amqp')
> >> >>> >
> >> >>> > Or if you'd like to try just redeploying without the sync charm
> >> >>> > using
> >> >>> > --edit-placement, that might get a healthy cluster going, just one
> >> >>> > without
> >> >>> > glance images.
> >> >>> > Then you could pretty easily deploy the charm manually, or just do
> >> >>> > without
> >> >>> > it and upload images you get from cloud-images.ubuntu.com
> manually .
> >> >>> >
> >> >>> > Sorry this is not as simple as it should be, yet :)
> >> >>> > -mike
> >> >>> >
> >> >>> > On Wed, Jul 29, 2015 at 4:00 PM, Mike McCracken
> >> >>> > <mike.mccracken at canonical.com> wrote:
> >> >>> >>
> >> >>> >> ok, so I just learned that the neutron-manage log should be in
> the
> >> >>> >> neutron-api unit, so can you 'juju ssh neutron-api/0' and look in
> >> >>> >> /var/log/neutron there?
> >> >>> >>
> >> >>> >> On Wed, Jul 29, 2015 at 3:34 PM, Jeff McLamb <mclamb at gmail.com>
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> The neutron-server.log that is 500MB+ and growing is nonstop
> >> >>> >>> repeated
> >> >>> >>> output of the following, due to a database table that does not
> >> >>> >>> exist:
> >> >>> >>>
> >> >>> >>> http://paste.ubuntu.com/11962679/
> >> >>> >>>
> >> >>> >>> On Wed, Jul 29, 2015 at 6:30 PM, Jeff McLamb <mclamb at gmail.com>
> >> >>> >>> wrote:
> >> >>> >>> > Hey Mike -
> >> >>> >>> >
> >> >>> >>> > OK so here is the juju status output. The quantum-gateway
> >> >>> >>> > doesn’t
> >> >>> >>> > look
> >> >>> >>> > too strange, but I am new. The exposed status is false, but so
> >> >>> >>> > it is
> >> >>> >>> > for all services, and I can definitely access, say, the
> >> >>> >>> > dashboard,
> >> >>> >>> > even though it is not “exposed”. One thing of note is the
> >> >>> >>> > public-address lines that sometimes use the domain names, e.g.
> >> >>> >>> > downright-feet.maas in this case, whereas some services use IP
> >> >>> >>> > addresses. I have noticed that I cannot resolve the maas names
> >> >>> >>> > from
> >> >>> >>> > the MAAS server (because I use the ISP’s DNS servers) but I
> can
> >> >>> >>> > resolve them from the deployed nodes.  Here is the output:
> >> >>> >>> >
> >> >>> >>> > http://paste.ubuntu.com/11962631/
> >> >>> >>> >
> >> >>> >>> > Here is the quantum gateway replay:
> >> >>> >>> >
> >> >>> >>> > http://paste.ubuntu.com/11962644/
> >> >>> >>> >
> >> >>> >>> > Where are the neutron-manage logs? I see lots of neutron stuff
> >> >>> >>> > on
> >> >>> >>> > various containers and nodes — the neutron-server.log is what
> I
> >> >>> >>> > pasted
> >> >>> >>> > before and it is 500+MB and growing across a few nodes, but I
> >> >>> >>> > can’t
> >> >>> >>> > seem to fine neutron-manage.
> >> >>> >>> >
> >> >>> >>> > Thanks!
> >> >>> >>> >
> >> >>> >>> > Jeff
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > On Wed, Jul 29, 2015 at 5:26 PM, Mike McCracken
> >> >>> >>> > <mike.mccracken at canonical.com> wrote:
> >> >>> >>> >> Hi Jeff, I asked internally and was asked if you could share
> >> >>> >>> >> the
> >> >>> >>> >> juju
> >> >>> >>> >> charm
> >> >>> >>> >> logs from quantum-gateway and the neutron-manage logs in
> >> >>> >>> >> /var/log/neutron.
> >> >>> >>> >>
> >> >>> >>> >> the charm log can be replayed by using 'juju debug-log -i
> >> >>> >>> >> quantum-gateway/0
> >> >>> >>> >> --replay'
> >> >>> >>> >>
> >> >>> >>> >> On Wed, Jul 29, 2015 at 2:03 PM, Mike McCracken
> >> >>> >>> >> <mike.mccracken at canonical.com> wrote:
> >> >>> >>> >>>
> >> >>> >>> >>> Sorry this is so frustrating.
> >> >>> >>> >>> Can you check 'juju status' for this environment and see if
> it
> >> >>> >>> >>> says
> >> >>> >>> >>> anything useful about the quantum-gateway service (aka
> >> >>> >>> >>> neutron,
> >> >>> >>> >>> the
> >> >>> >>> >>> juju
> >> >>> >>> >>> service name will be updated soon).
> >> >>> >>> >>>
> >> >>> >>> >>> -mike
> >> >>> >>> >>>
> >> >>> >>> >>> On Wed, Jul 29, 2015 at 1:15 PM, Jeff McLamb
> >> >>> >>> >>> <mclamb at gmail.com>
> >> >>> >>> >>> wrote:
> >> >>> >>> >>>>
> >> >>> >>> >>>> OK, making progress now. Per your recommendation I removed
> >> >>> >>> >>>> and
> >> >>> >>> >>>> added
> >> >>> >>> >>>> back in the trusty sync charm manually.
> >> >>> >>> >>>>
> >> >>> >>> >>>> Now, I can log in to the horizon dashboard!
> >> >>> >>> >>>>
> >> >>> >>> >>>> However, several tabs result in a generic OpenStack (not
> >> >>> >>> >>>> Ubuntu-customized like the general dashboard pages)
> >> >>> >>> >>>> "Something
> >> >>> >>> >>>> went
> >> >>> >>> >>>> wrong! An unexpected error has occurred. Try refreshing the
> >> >>> >>> >>>> page..."
> >> >>> >>> >>>>
> >> >>> >>> >>>> The tabs in question that give those results are Compute ->
> >> >>> >>> >>>> Access &
> >> >>> >>> >>>> Security, Network -> Network Topology,
> >> >>> >>> >>>>
> >> >>> >>> >>>> When I go to pages like Network -> Routers, it does render,
> >> >>> >>> >>>> but
> >> >>> >>> >>>> there
> >> >>> >>> >>>> are error popup boxes in the page itself with:
> >> >>> >>> >>>>
> >> >>> >>> >>>> Error: Unable to retrieve router list.
> >> >>> >>> >>>>
> >> >>> >>> >>>> and
> >> >>> >>> >>>>
> >> >>> >>> >>>> Error: Unable to retrieve a list of external networks
> >> >>> >>> >>>> "Connection
> >> >>> >>> >>>> to
> >> >>> >>> >>>> neutron failed: HTTPConnectionPool(host='192.168.1.45',
> >> >>> >>> >>>> port=9696):
> >> >>> >>> >>>> Max retries exceeded with url:
> >> >>> >>> >>>> /v2.0/networks.json?router%3Aexternal=True (Caused by
> <class
> >> >>> >>> >>>> 'httplib.BadStatusLine'>: '')”.
> >> >>> >>> >>>>
> >> >>> >>> >>>> If I do a `juju ssh openstack-dashboard/0` and tail -f
> >> >>> >>> >>>> /var/log/apache2/error.log I get the following when
> accessing
> >> >>> >>> >>>> one
> >> >>> >>> >>>> of
> >> >>> >>> >>>> the failed pages:
> >> >>> >>> >>>>
> >> >>> >>> >>>> http://paste.ubuntu.com/11961863/
> >> >>> >>> >>>>
> >> >>> >>> >>>> Furthermore, looking at the neutron server logs, I see
> >> >>> >>> >>>> non-stop
> >> >>> >>> >>>> traces
> >> >>> >>> >>>> about the neutron.ml2_gre_allocations table not existing:
> >> >>> >>> >>>>
> >> >>> >>> >>>> http://paste.ubuntu.com/11961891/
> >> >>> >>> >>>>
> >> >>> >>> >>>> Getting closer, bit by bit.
> >> >>> >>> >>>>
> >> >>> >>> >>>> Thanks for all the help,
> >> >>> >>> >>>>
> >> >>> >>> >>>> Jeff
> >> >>> >>> >>>
> >> >>> >>> >>>
> >> >>> >>> >>>
> >> >>> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >
> >> >>
> >> >>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-openstack-installer/attachments/20150730/0eb06d0f/attachment-0001.html>


More information about the ubuntu-openstack-installer mailing list