[Bug 1950186] Re: Nova doesn't account for hugepages when scheduling VMs

James Page 1950186 at bugs.launchpad.net
Fri Feb 18 15:45:32 UTC 2022


Discussed with the Nova team and this is a know issue at the moment -
mixing instance types with and with NUMA configuration features such as
hugepages will create this type of issue.

The placement API (which is used for scheduling) does not track
different pagesizes so can't deal with this scenario today.

Feedback indicated that using flavors with explicit configuration to use
small pages might do the trick in terms if triggering the codepath
through the NUMA cell configuration in Nova.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1950186

Title:
  Nova doesn't account for hugepages when scheduling VMs

Status in nova package in Ubuntu:
  New

Bug description:
  Description
  ===========

  When hugepages are enabled on the host it's possible to schedule VMs
  using more RAM than available.

  On the node with memory usage presented below it was possible to
  schedule 6 instances using a total of 140G of memory and a non-
  hugepages-enabled flavor. The same machine has 188G of memory in
  total, of which 64G were reserved for hugepages. Additional ~4G were
  used for housekeeping, OpenStack control plane, etc. This resulted in
  overcommitment of roughly 20G.

  After running memory intensive operations on the VMs, some of them got
  OOM killed.

  $ cat /proc/meminfo  | egrep "^(Mem|Huge)" # on the compute node
  MemTotal:       197784792 kB
  MemFree:        115005288 kB
  MemAvailable:   116745612 kB
  HugePages_Total:      64
  HugePages_Free:       64
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:    1048576 kB
  Hugetlb:        67108864 kB

  $ os hypervisor show copmute1 -c memory_mb -c memory_mb_used -c free_ram_mb
  +----------------+--------+
  | Field          | Value  |
  +----------------+--------+
  | free_ram_mb    | 29309  |
  | memory_mb      | 193149 |
  | memory_mb_used | 163840 |
  +----------------+--------+

  $ os host show compute1
  +----------+----------------------------------+-----+-----------+---------+
  | Host     | Project                          | CPU | Memory MB | Disk GB |
  +----------+----------------------------------+-----+-----------+---------+
  | compute1 | (total)                          |   0 |    193149 |     893 |
  | compute1 | (used_now)                       |  72 |    163840 |     460 |
  | compute1 | (used_max)                       |  72 |    147456 |     460 |
  | compute1 | some_project_id_was_here         |   2 |      4096 |      40 |
  | compute1 | another_anonymized_id_here       |  70 |    143360 |     420 |
  +----------+----------------------------------+-----+-----------+---------+

  $ os resource provider inventory list uuid_of_compute1_node
  +----------------+------------------+----------+----------+----------+-----------+--------+
  | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size |  total |
  +----------------+------------------+----------+----------+----------+-----------+--------+
  | MEMORY_MB      |              1.0 |        1 |   193149 |    16384 |         1 | 193149 |
  | DISK_GB        |              1.0 |        1 |      893 |        0 |         1 |    893 |
  | PCPU           |              1.0 |        1 |       72 |        0 |         1 |     72 |
  +----------------+------------------+----------+----------+----------+-----------+--------+

  Steps to reproduce
  ==================

  1. Reserve a large part of memory for hugepages on the hypervisor.
  2. Create VMs using a flavor that uses a lot of memory that isn't backed by hugepages.
  3. Start memory intensive operations on the VMs, e.g.:
  stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d", $2 * 0.98;}' < /proc/meminfo)k --vm-keep -m 1

  Expected result
  ===============

  Nova should not allow overcommitment and should be able to
  differentiate between hugepages and "normal" memory.

  Actual result
  =============
  Overcommitment resulting in OOM kills.

  Environment
  ===========
  nova-api-metadata 2:21.2.1-0ubuntu1~cloud0
  nova-common 2:21.2.1-0ubuntu1~cloud0
  nova-compute 2:21.2.1-0ubuntu1~cloud0
  nova-compute-kvm 2:21.2.1-0ubuntu1~cloud0
  nova-compute-libvirt 2:21.2.1-0ubuntu1~cloud0
  python3-nova 2:21.2.1-0ubuntu1~cloud0
  python3-novaclient 2:17.0.0-0ubuntu1~cloud0

  OS: Ubuntu 18.04.5 LTS
  Hypervisor: libvirt + KVM

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1950186/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list