Re: Multi install with existing MAAS starts all services except for “IP Pending” on Glance Simplestreams Image Sync

Fri Jul 31 19:15:07 UTC 2015

Success!

Sorry it took a while to get back, but just wanted to follow up and
say I finally have a start-to-finish working Multi install! The TL;DR
of it is that I need to have a trust-based juju deployment host
running the latest openstack-installer from the experimental ppa. The
trust requirement is due to the attempts to install the glance sync
charm with from vivid if on vivid, and the experimental ppa is
required because the stable branch does not seem to honor http-proxy
and https-proxy command line arguments. It is also necessary that the
juju deployment host be on the UTC timezone in order to match the
machines deployed by juju/MAAS.

The last few failed iterations were due to a physical machine failing
to deploy, say the compute node. This issue was on my end as sometimes
my physical servers do not boot without manual interaction due to some
bizarre PSU voltage too low warning. If the compute node does not come
up within a reasonable amount of time, it seems some of the scripts
get run improperly, hence the issues with keystone users, roles, etc.
not being available.

This last time I manually made sure all servers came up and intervened
if they tried to block.

The end result is I now have a seemingly working copy of Juno (I used
—openstack-release juno) and all interactions on the horizon dashboard
seem to be good! I will keep messing around with it and try to deploy
some VMs when I get a chance. I will also likely try a re-deploy of
Kilo and see how that works.

Thanks so much for the help, Mike and Adam, I really appreciate it.
Hopefully we can feed some of this stuff back into the process to make
it easier. I’ll follow up with more… I think the main page on
ubuntu.com for deploying the Canonical Distribution needs some
updating. For example, it fails to say you must install juju before
openstack ;)

Thank you than you!

Jeff

On Thu, Jul 30, 2015 at 5:01 PM, Mike McCracken
<mike.mccracken at canonical.com> wrote:
> It definitely looks like the initial failed compute node deployment caused
> some problems.
>
> It looks like the script was being run repeatedly and failing on the
> following command:
>
> keystone user-role-add --user ubuntu --role Member --tenant ubuntu
>
>>> No role with a name or ID of 'Member' exists.
>
> which is the same thing that happened when you tried it again just now.
>
> Then you apparently killed the install and tried again, at which point the
> log is flooded with errors relating to it not finding the machine ID that it
> recorded in the placement. It's pretty clear that it doesn't deal well with
> machines where you placed a service leaving MAAS afterward.
>
> The setup script doesn't run again because after restarting, the
> nova-cloud-controller service is marked as having been deployed, even though
> the script never actually completed successfully.
>
> Off the top of my head I don't know what might be going on with keystone, I
> thought the Member role was created by default.
> Maybe the keystone unit's debug log has a clue, but at this point I'd be
> tempted to just try again and avoid the broken machine.
>
> I'm sorry this has been such an ordeal, thanks for testing things out!
> -mike
>
> On Thu, Jul 30, 2015 at 12:02 PM, Jeff McLamb <mclamb at gmail.com> wrote:
>>
>> Here is commands.log, which definitely has complaints about
>> nova-controller-setup.sh:
>>
>> http://paste.ubuntu.com/11968600/
>>
>> And after running nova-controller-setup.sh again via juju as you
>> mentioned:
>>
>> http://paste.ubuntu.com/11968544/
>>
>>
>> So I guess because the compute node failed to deploy in the first
>> place, the installer still tried to issue the nova-controller-setup.sh
>> script but it failed without a compute node? Or is that not involved
>> in that process? And then, when I re-commissioned and deployed the
>> compute node, it failed to re-run the script?
>>
>> Thanks,
>>
>> Jeff
>>
>>
>> On Thu, Jul 30, 2015 at 1:41 PM, Mike McCracken
>> <mike.mccracken at canonical.com> wrote:
>> > Hi Jeff, the ubuntu user and roles etc are created by a script that the
>> > installer runs after deploying nova-cloud-controller.
>> > The file ~/.cloud-install/commands.log will have any errors encountered
>> > while trying to run that script.
>> > You can also look at the script that would run in
>> > ~/.cloud-install/nova-controller-setup.sh, and optionally try running it
>> > yourself - it should be present on the nova-cloud-controller unit in
>> > /tmp so
>> > you can do e.g.
>> > % juju run --unit nova-cloud-controller/0 "/tmp/nova-controller-setup.sh
>> > <the password you used in the installer> Single"
>> > to try it again.
>> >
>> > On Thu, Jul 30, 2015 at 10:14 AM, Jeff McLamb <mclamb at gmail.com> wrote:
>> >>
>> >> So it was easy enough to pick up from the single failed node. After
>> >> Deleting it, re-enlisting, commissioning, etc. I was presented with a
>> >> Ready node with a new name, etc.
>> >>
>> >> I went into openstack-status and simply added the Compute service that
>> >> was missing and deployed it to this new node. After a while it was up,
>> >> all services looked good.
>> >>
>> >> I issued a `juju machine remove 1` to remove the pending failed
>> >> machine from juju that was no longer in the MAAS database — it had
>> >> nothing running on it  obviously, so I figured it would be best to
>> >> remove it from juju. The new machine is machine 4.
>> >>
>> >> Now when I try to login to horizon, I get "An error occurred
>> >> authenticating. Please try again later.”
>> >>
>> >> The keystone logs suggest user ubuntu and various roles and projects
>> >> were not created, even though openstack-installer tells me to login to
>> >> horizon with user ubuntu and the password I gave it.
>> >>
>> >> Here are the keystone logs:
>> >>
>> >> http://paste.ubuntu.com/11967913/
>> >>
>> >> Here are the apache error logs on the openstack-dashboard container:
>> >>
>> >> http://paste.ubuntu.com/11967922/
>> >>
>> >> Any ideas here?
>> >>
>> >>
>> >> On Thu, Jul 30, 2015 at 12:34 PM, Jeff McLamb <mclamb at gmail.com> wrote:
>> >> > Just to give you an update where I am:
>> >> >
>> >> > I tried various forms still using the underlying vivid MAAS/juju
>> >> > deployment host, tried --edit-placement, which erred out, tried
>> >> > removing Glance Sync again, etc. all to no avail.
>> >> >
>> >> > Then I created a trusty VM on the MAAS host and installed the stable
>> >> > juju and cloud-install ppa's. The problem with the stable version of
>> >> > openstack-install is that it does not honor the http_proxy and
>> >> > https_proxy lines passed on the command-line. I can see that they do
>> >> > not get put into the environments.yaml file, so I ended up with the
>> >> > same issue there as I had originally, where it could not download the
>> >> > tools.
>> >> >
>> >> > So I updated the cloud-install to the experimental on the trusty juju
>> >> > deployment VM and used the latest version, which worked fine with
>> >> > http_proxy and https_proxy. I have played around with trying to
>> >> > deploy
>> >> > both juno and kilo as well.
>> >> >
>> >> > My latest attempt on trusty, deploying juno, has left one physical
>> >> > node in a Failed Deployment state, which seems to have been caused
>> >> > because it keeps saying the BMC is busy, so it can't control power. I
>> >> > tried releasing it, which failed, so I ultimately had to Delete it
>> >> > and
>> >> > re-enlist, re-commission.
>> >> >
>> >> > Now I am at a point where the machine is back to Ready and the
>> >> > openstack-install is still waiting on 1 last machine (the other 2
>> >> > deployed just fine)... When something like this happens, is it
>> >> > possible to re-deploy the last remaining host, or must I start over
>> >> > deploying all machines again?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jeff
>> >> >
>> >> >
>> >> > On Thu, Jul 30, 2015 at 1:21 AM, Mike McCracken
>> >> > <mike.mccracken at canonical.com> wrote:
>> >> >>
>> >> >>
>> >> >> On Wed, Jul 29, 2015 at 5:30 PM, Jeff McLamb <mclamb at gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> OK a quick look at the neutron-api/0 /var/log/neutron just shows
>> >> >>> the
>> >> >>> neutron-server.log as before… but since I stepped away in the past
>> >> >>> hour it’s now at 800MB and counting! ;)
>> >> >>>
>> >> >>> I will play around with the relations a bit just to learn what’s
>> >> >>> going
>> >> >>> on, but then I will take your advice and try various alternatives
>> >> >>> with
>> >> >>> —edit-placement first, then finally just changing the underlying
>> >> >>> MAAS
>> >> >>> deployment server to trusty and see where it takes me.
>> >> >>
>> >> >>
>> >> >> Sounds good
>> >> >>
>> >> >>>
>> >> >>> Could also try
>> >> >>> to install without —upstream-ppa which I imagine will install juno
>> >> >>> instead of kilo?
>> >> >>
>> >> >>
>> >> >> oh, --upstream-ppa doesn't do anything for the MAAS install path,
>> >> >> it's
>> >> >> only
>> >> >> applicable to the containerized single install.
>> >> >> It's harmless, though. On the single install, it's used to specify
>> >> >> that
>> >> >> version of the "openstack" package (which contains
>> >> >> openstack-install)
>> >> >> that
>> >> >> will be installed on the container to run the second half of the
>> >> >> process
>> >> >> should come from our experimental PPA. It could use some better
>> >> >> docs/usage
>> >> >> string.
>> >> >>
>> >> >> If you're interested in trying out other openstack release versions,
>> >> >> you
>> >> >> want to look at --openstack-release.
>> >> >>
>> >> >> -mike
>> >> >>
>> >> >>>
>> >> >>> Will keep you posted and continued thanks for all the help.
>> >> >>>
>> >> >>> Jeff
>> >> >>>
>> >> >>> On Wed, Jul 29, 2015 at 7:08 PM, Mike McCracken
>> >> >>> <mike.mccracken at canonical.com> wrote:
>> >> >>> > Jeff, based on the other logs you sent me, e.g.
>> >> >>> > neutron-metadata-agent.log,
>> >> >>> > it was pointed out to me that it's trying to connect to rabbitMQ
>> >> >>> > on
>> >> >>> > localhost, which is wrong.
>> >> >>> > So something is failing to complete the juju relations.
>> >> >>> > My hypothesis is that the failing vivid-series charm is messing
>> >> >>> > up
>> >> >>> > juju's
>> >> >>> > relations.
>> >> >>> > If you want to dig further, you can start looking at the
>> >> >>> > relations
>> >> >>> > using
>> >> >>> > e.g. 'juju run --unit 'relation-get amqp:rabbitmq' ' (might just
>> >> >>> > be
>> >> >>> > 'amqp')
>> >> >>> >
>> >> >>> > Or if you'd like to try just redeploying without the sync charm
>> >> >>> > using
>> >> >>> > --edit-placement, that might get a healthy cluster going, just
>> >> >>> > one
>> >> >>> > without
>> >> >>> > glance images.
>> >> >>> > Then you could pretty easily deploy the charm manually, or just
>> >> >>> > do
>> >> >>> > without
>> >> >>> > it and upload images you get from cloud-images.ubuntu.com
>> >> >>> > manually .
>> >> >>> >
>> >> >>> > Sorry this is not as simple as it should be, yet :)
>> >> >>> > -mike
>> >> >>> >
>> >> >>> > On Wed, Jul 29, 2015 at 4:00 PM, Mike McCracken
>> >> >>> > <mike.mccracken at canonical.com> wrote:
>> >> >>> >>
>> >> >>> >> ok, so I just learned that the neutron-manage log should be in
>> >> >>> >> the
>> >> >>> >> neutron-api unit, so can you 'juju ssh neutron-api/0' and look
>> >> >>> >> in
>> >> >>> >> /var/log/neutron there?
>> >> >>> >>
>> >> >>> >> On Wed, Jul 29, 2015 at 3:34 PM, Jeff McLamb <mclamb at gmail.com>
>> >> >>> >> wrote:
>> >> >>> >>>
>> >> >>> >>> The neutron-server.log that is 500MB+ and growing is nonstop
>> >> >>> >>> repeated
>> >> >>> >>> output of the following, due to a database table that does not
>> >> >>> >>> exist:
>> >> >>> >>>
>> >> >>> >>> http://paste.ubuntu.com/11962679/
>> >> >>> >>>
>> >> >>> >>> On Wed, Jul 29, 2015 at 6:30 PM, Jeff McLamb <mclamb at gmail.com>
>> >> >>> >>> wrote:
>> >> >>> >>> > Hey Mike -
>> >> >>> >>> >
>> >> >>> >>> > OK so here is the juju status output. The quantum-gateway
>> >> >>> >>> > doesn’t
>> >> >>> >>> > look
>> >> >>> >>> > too strange, but I am new. The exposed status is false, but
>> >> >>> >>> > so
>> >> >>> >>> > it is
>> >> >>> >>> > for all services, and I can definitely access, say, the
>> >> >>> >>> > dashboard,
>> >> >>> >>> > even though it is not “exposed”. One thing of note is the
>> >> >>> >>> > public-address lines that sometimes use the domain names,
>> >> >>> >>> > e.g.
>> >> >>> >>> > downright-feet.maas in this case, whereas some services use
>> >> >>> >>> > IP
>> >> >>> >>> > addresses. I have noticed that I cannot resolve the maas
>> >> >>> >>> > names
>> >> >>> >>> > from
>> >> >>> >>> > the MAAS server (because I use the ISP’s DNS servers) but I
>> >> >>> >>> > can
>> >> >>> >>> > resolve them from the deployed nodes.  Here is the output:
>> >> >>> >>> >
>> >> >>> >>> > http://paste.ubuntu.com/11962631/
>> >> >>> >>> >
>> >> >>> >>> > Here is the quantum gateway replay:
>> >> >>> >>> >
>> >> >>> >>> > http://paste.ubuntu.com/11962644/
>> >> >>> >>> >
>> >> >>> >>> > Where are the neutron-manage logs? I see lots of neutron
>> >> >>> >>> > stuff
>> >> >>> >>> > on
>> >> >>> >>> > various containers and nodes — the neutron-server.log is what
>> >> >>> >>> > I
>> >> >>> >>> > pasted
>> >> >>> >>> > before and it is 500+MB and growing across a few nodes, but I
>> >> >>> >>> > can’t
>> >> >>> >>> > seem to fine neutron-manage.
>> >> >>> >>> >
>> >> >>> >>> > Thanks!
>> >> >>> >>> >
>> >> >>> >>> > Jeff
>> >> >>> >>> >
>> >> >>> >>> >
>> >> >>> >>> > On Wed, Jul 29, 2015 at 5:26 PM, Mike McCracken
>> >> >>> >>> > <mike.mccracken at canonical.com> wrote:
>> >> >>> >>> >> Hi Jeff, I asked internally and was asked if you could share
>> >> >>> >>> >> the
>> >> >>> >>> >> juju
>> >> >>> >>> >> charm
>> >> >>> >>> >> logs from quantum-gateway and the neutron-manage logs in
>> >> >>> >>> >> /var/log/neutron.
>> >> >>> >>> >>
>> >> >>> >>> >> the charm log can be replayed by using 'juju debug-log -i
>> >> >>> >>> >> quantum-gateway/0
>> >> >>> >>> >> --replay'
>> >> >>> >>> >>
>> >> >>> >>> >> On Wed, Jul 29, 2015 at 2:03 PM, Mike McCracken
>> >> >>> >>> >> <mike.mccracken at canonical.com> wrote:
>> >> >>> >>> >>>
>> >> >>> >>> >>> Sorry this is so frustrating.
>> >> >>> >>> >>> Can you check 'juju status' for this environment and see if
>> >> >>> >>> >>> it
>> >> >>> >>> >>> says
>> >> >>> >>> >>> anything useful about the quantum-gateway service (aka
>> >> >>> >>> >>> neutron,
>> >> >>> >>> >>> the
>> >> >>> >>> >>> juju
>> >> >>> >>> >>> service name will be updated soon).
>> >> >>> >>> >>>
>> >> >>> >>> >>> -mike
>> >> >>> >>> >>>
>> >> >>> >>> >>> On Wed, Jul 29, 2015 at 1:15 PM, Jeff McLamb
>> >> >>> >>> >>> <mclamb at gmail.com>
>> >> >>> >>> >>> wrote:
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> OK, making progress now. Per your recommendation I removed
>> >> >>> >>> >>>> and
>> >> >>> >>> >>>> added
>> >> >>> >>> >>>> back in the trusty sync charm manually.
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> Now, I can log in to the horizon dashboard!
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> However, several tabs result in a generic OpenStack (not
>> >> >>> >>> >>>> Ubuntu-customized like the general dashboard pages)
>> >> >>> >>> >>>> "Something
>> >> >>> >>> >>>> went
>> >> >>> >>> >>>> wrong! An unexpected error has occurred. Try refreshing
>> >> >>> >>> >>>> the
>> >> >>> >>> >>>> page..."
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> The tabs in question that give those results are Compute
>> >> >>> >>> >>>> ->
>> >> >>> >>> >>>> Access &
>> >> >>> >>> >>>> Security, Network -> Network Topology,
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> When I go to pages like Network -> Routers, it does
>> >> >>> >>> >>>> render,
>> >> >>> >>> >>>> but
>> >> >>> >>> >>>> there
>> >> >>> >>> >>>> are error popup boxes in the page itself with:
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> Error: Unable to retrieve router list.
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> and
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> Error: Unable to retrieve a list of external networks
>> >> >>> >>> >>>> "Connection
>> >> >>> >>> >>>> to
>> >> >>> >>> >>>> neutron failed: HTTPConnectionPool(host='192.168.1.45',
>> >> >>> >>> >>>> port=9696):
>> >> >>> >>> >>>> Max retries exceeded with url:
>> >> >>> >>> >>>> /v2.0/networks.json?router%3Aexternal=True (Caused by
>> >> >>> >>> >>>> <class
>> >> >>> >>> >>>> 'httplib.BadStatusLine'>: '')”.
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> If I do a `juju ssh openstack-dashboard/0` and tail -f
>> >> >>> >>> >>>> /var/log/apache2/error.log I get the following when
>> >> >>> >>> >>>> accessing
>> >> >>> >>> >>>> one
>> >> >>> >>> >>>> of
>> >> >>> >>> >>>> the failed pages:
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> http://paste.ubuntu.com/11961863/
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> Furthermore, looking at the neutron server logs, I see
>> >> >>> >>> >>>> non-stop
>> >> >>> >>> >>>> traces
>> >> >>> >>> >>>> about the neutron.ml2_gre_allocations table not existing:
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> http://paste.ubuntu.com/11961891/
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> Getting closer, bit by bit.
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> Thanks for all the help,
>> >> >>> >>> >>>>
>> >> >>> >>> >>>> Jeff
>> >> >>> >>> >>>
>> >> >>> >>> >>>
>> >> >>> >>> >>>
>> >> >>> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >
>> >> >>
>> >> >>
>> >
>> >
>
>