[Bug 1762368] [NEW] reinstalling a compute node and then upgrading from pike to queens fails
Junien Fridrick
1762368 at bugs.launchpad.net
Mon Apr 9 10:19:10 UTC 2018
Public bug reported:
Hi,
I had a working xenial/pike cloud recently, using neutron-ovs, with some
compute nodes, in particular a ppc64 compute node named bagon. I needed
to reinstall it, so I did the following :
1. nova service-delete <id of the compute service on bagon>
2. neutron agent-delete <uuid of the openvswitch agent on bagon>
3. Re-commission the node and deploy the nova-compute application on it
After what, some times later, I upgraded the cloud to queens. This
apparently caused the node to stop working. It was logging the following
error (nova-compute.log on bagon) :
2018-04-09 06:25:26.099 128068 ERROR nova.scheduler.client.report [req-
f1eebe14-fcfb-4878-b557-50105790d3b5 6bd667e324ea463abaacbc1f9c3bbed3
95cafd7ede504ef6b7b67ead691d3883 - default default] [req-29de76b9-50c2
-4bff-85a9-363d665c250f] Failed to create resource provider record in
placement API for UUID 2d236848-df06-47f1-92a4-a1afefe62931. Got 409:
{"errors": [{"status": 409, "request_id": "req-29de76b9-50c2-4bff-
85a9-363d665c250f", "detail": "There was a conflict when trying to
complete your request.\n\n Conflicting resource provider name:
bagon.fqdn already exists. ", "title": "Conflict"}]}.
Full stack trace : https://pastebin.canonical.com/p/ynhpgsB8bp/ (sorry,
Canonical-only link)
I tracked down the problem, and found it was due to the following
mismatch :
mysql> select uuid,host,deleted from compute_nodes where host='bagon';
+--------------------------------------+-------+---------+
| uuid | host | deleted |
+--------------------------------------+-------+---------+
| 2d236848-df06-47f1-92a4-a1afefe62931 | bagon | 0 |
| 92232041-9767-466b-a82f-20ecef0af6fa | bagon | 9 |
+--------------------------------------+-------+---------+
2 rows in set (0.00 sec)
mysql> use nova_api;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select uuid,name from resource_providers where name like 'bagon%';
+--------------------------------------+--------------------------+
| uuid | name |
+--------------------------------------+--------------------------+
| 92232041-9767-466b-a82f-20ecef0af6fa | bagon.fqdn |
+--------------------------------------+--------------------------+
1 row in set (0.00 sec)
The nova.compute_nodes table has 2 records for bagon, as expected : one
is the old, deleted record and the other the current, live record.
The problem, as you can see above, is that the
nova_api.resource_providers table had the old UUID for bagon. I'm not
exactly sure at what point nova-compute on bagon started failing, I'm
fairly confident it was OK after the reinstall, so I suspect something
happened during the migration from pike to queens.
I manually updated the UUID in the resource_providers table, and bagon
started working fine.
I can't try to repro because I can't downgrade the cluster to try the
pike=>queens upgrade a second time, but hopefully you can.
Thanks !
** Affects: cloud-archive
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1762368
Title:
reinstalling a compute node and then upgrading from pike to queens
fails
Status in Ubuntu Cloud Archive:
New
Bug description:
Hi,
I had a working xenial/pike cloud recently, using neutron-ovs, with
some compute nodes, in particular a ppc64 compute node named bagon. I
needed to reinstall it, so I did the following :
1. nova service-delete <id of the compute service on bagon>
2. neutron agent-delete <uuid of the openvswitch agent on bagon>
3. Re-commission the node and deploy the nova-compute application on it
After what, some times later, I upgraded the cloud to queens. This
apparently caused the node to stop working. It was logging the
following error (nova-compute.log on bagon) :
2018-04-09 06:25:26.099 128068 ERROR nova.scheduler.client.report
[req-f1eebe14-fcfb-4878-b557-50105790d3b5
6bd667e324ea463abaacbc1f9c3bbed3 95cafd7ede504ef6b7b67ead691d3883 -
default default] [req-29de76b9-50c2-4bff-85a9-363d665c250f] Failed to
create resource provider record in placement API for UUID
2d236848-df06-47f1-92a4-a1afefe62931. Got 409: {"errors": [{"status":
409, "request_id": "req-29de76b9-50c2-4bff-85a9-363d665c250f",
"detail": "There was a conflict when trying to complete your
request.\n\n Conflicting resource provider name: bagon.fqdn already
exists. ", "title": "Conflict"}]}.
Full stack trace : https://pastebin.canonical.com/p/ynhpgsB8bp/
(sorry, Canonical-only link)
I tracked down the problem, and found it was due to the following
mismatch :
mysql> select uuid,host,deleted from compute_nodes where host='bagon';
+--------------------------------------+-------+---------+
| uuid | host | deleted |
+--------------------------------------+-------+---------+
| 2d236848-df06-47f1-92a4-a1afefe62931 | bagon | 0 |
| 92232041-9767-466b-a82f-20ecef0af6fa | bagon | 9 |
+--------------------------------------+-------+---------+
2 rows in set (0.00 sec)
mysql> use nova_api;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select uuid,name from resource_providers where name like 'bagon%';
+--------------------------------------+--------------------------+
| uuid | name |
+--------------------------------------+--------------------------+
| 92232041-9767-466b-a82f-20ecef0af6fa | bagon.fqdn |
+--------------------------------------+--------------------------+
1 row in set (0.00 sec)
The nova.compute_nodes table has 2 records for bagon, as expected :
one is the old, deleted record and the other the current, live record.
The problem, as you can see above, is that the
nova_api.resource_providers table had the old UUID for bagon. I'm not
exactly sure at what point nova-compute on bagon started failing, I'm
fairly confident it was OK after the reinstall, so I suspect something
happened during the migration from pike to queens.
I manually updated the UUID in the resource_providers table, and bagon
started working fine.
I can't try to repro because I can't downgrade the cluster to try the
pike=>queens upgrade a second time, but hopefully you can.
Thanks !
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1762368/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list