[Bug 1719770] Re: hypervisor stats issue after charm removal if nova-compute service not disabled first

Hua Zhang joshua.zhang at canonical.com
Sat Sep 30 03:34:33 UTC 2017


I can not reproduce the problem, I see SQL has used
'compute_nodes.deleted = 0' to filter deleted services as the following
debug info shows (mitaka).

(Pdb) <oslo_db.sqlalchemy.orm.Query object at 0x7f97672b9810>
(Pdb) 'SELECT count(compute_nodes.id) AS count_1, sum(compute_nodes.vcpus) AS sum_1, sum(compute_nodes.memory_mb) AS sum_2, sum(compute_nodes.local_gb) AS sum_3, sum(compute_nodes.vcpus_used) AS sum_4, sum(compute_nodes.memory_mb_used) AS sum_5, sum(compute_nodes.local_gb_used) AS sum_6, sum(compute_nodes.free_ram_mb) AS sum_7, sum(compute_nodes.free_disk_gb) AS sum_8, sum(compute_nodes.current_workload) AS sum_9, sum(compute_nodes.running_vms) AS sum_10, sum(compute_nodes.disk_available_least) AS sum_11 \nFROM compute_nodes, services \nWHERE compute_nodes.deleted = :deleted_1 AND services.disabled = 0 AND services."binary" = :binary_1 AND (services.host = compute_nodes.host OR services.id = compute_nodes.service_id)'

This is result I run above SQL in mysql directly, all are OK.

mysql> SELECT count(compute_nodes.id) AS count_1, sum(compute_nodes.vcpus) AS sum_1, sum(compute_nodes.memory_mb) AS sum_2, sum(compute_nodes.local_gb) AS sum_3, sum(compute_nodes.vcpus_used) AS sum_4, sum(compute_nodes.memory_mb_used) AS sum_5, sum(compute_nodes.local_gb_used) AS sum_6, sum(compute_nodes.free_ram_mb) AS sum_7, sum(compute_nodes.free_disk_gb) AS sum_8, sum(compute_nodes.current_workload) AS sum_9, sum(compute_nodes.running_vms) AS sum_10, sum(compute_nodes.disk_available_least) AS sum_11 FROM compute_nodes, services WHERE compute_nodes.deleted = 0 AND services.disabled = 0 AND services.binary = 'nova-compute' AND (services.host = compute_nodes.host OR services.id = compute_nodes.service_id);
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+
| count_1 | sum_1 | sum_2 | sum_3 | sum_4 | sum_5 | sum_6 | sum_7 | sum_8 | sum_9 | sum_10 | sum_11 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+
|       2 |     4 |  7902 |    76 |     0 |  1024 |     0 |  6878 |    76 |     0 |      0 |     72 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+--------+
1 row in set (0.00 sec)

mysql> SELECT sum(compute_nodes.vcpus) FROM compute_nodes, services WHERE compute_nodes.deleted = 0 AND services.disabled = 0 AND services.binary = 'nova-compute' AND (services.host = compute_nodes.host OR services.id = compute_nodes.service_id);
+--------------------------+
| sum(compute_nodes.vcpus) |
+--------------------------+
|                        4 |
+--------------------------+
1 row in set (0.00 sec)

Below are steps I used to create test env:

1, There are 3 nova-compute nodes initially.

2, Then use 'openstack compute service delete 10' to delete one compute
service.

3, 'select * from services where id=10' will show deleted field of this
record is not 0 no longer.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1719770

Title:
  hypervisor stats issue after charm removal if nova-compute service not
  disabled first

Status in OpenStack nova-compute charm:
  Invalid
Status in nova package in Ubuntu:
  New

Bug description:
  In an environment with 592 physical threads (lscpu |grep '^CPU.s' and
  openstack hypervisor show -f value -c vcpus both show correct counts)
  I am seeing 712 vcpus. (likely also seeing inflated memory_mb and
  other stats due to the issue.)

  Querying the nova services DB table, I see:
  http://pastebin.ubuntu.com/25624553/

  It appears that of the 6 machines showing deleted in the services
  table, only one is showing as disabled.

  Digging through the nova/db/sqlalchemy/api.py code, it appears that
  there are filters on the hypervisor stats for Service.disabled ==
  false() and Service.binary == 'nova-compute', but I don't see it
  filtering for deleted == 0.

  I'm not exactly certain of the timeline of my uninstall and reinstall
  of the nova-compute units on the 6 x 24vcpu servers happened (see
  *-ST-{1,2} nova-compute services) that caused this behavior of the
  services not getting disabled, but nova api for hypervisor stats might
  be well served to filter out deleted services as well as disabled
  services, or if a deleted service should never not be disabled, nova
  service-delete should also set the disabled flag for the service.

  These services and compute_nodes do not show up in openstack
  hypervisor list.

  Site is running up-to-date Xenial/Mitaka on openstack-charmers 17.02.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1719770/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list