[Bug 1818239] Re: scheduler: build failure high negative weighting

OpenStack Infra 1818239 at bugs.launchpad.net
Tue Mar 5 07:45:28 UTC 2019


Reviewed:  https://review.openstack.org/640698
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=c5029e9831ab5063485877213987d6827c4d86f1
Submitter: Zuul
Branch:    master

commit c5029e9831ab5063485877213987d6827c4d86f1
Author: James Page <james.page at ubuntu.com>
Date:   Mon Mar 4 09:25:46 2019 +0000

    Disable BuildFailureWeigher
    
    Disable the BuildFailureWeigher used when weighting hosts during
    instance scheduling. A single build failure will result in a
    -1000000 weighting which effectively excludes the hypervisor
    from the scheduling decision.
    
    A bad image can result in build failures resulting in a heavy
    load on hypervisors which have not had a build failure with
    those that have effectively being ignored; the build failure
    count will be reset on a successful build but due to the high
    weighting this won't happen until all resources on known good
    hypervisors have been completely consumed.
    
    Change-Id: I4d4367ef20e2a20aee1e26d4a0ec69cad2ac69d6
    Closes-Bug: 1818239


** Changed in: charm-nova-cloud-controller
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1818239

Title:
  scheduler: build failure high negative weighting

Status in OpenStack nova-cloud-controller charm:
  Fix Committed
Status in nova package in Ubuntu:
  Won't Fix

Bug description:
  Whilst debugging a Queens cloud which seems to be landing all new
  instances on 3 out of 9 hypervisors (which resulted in three very
  heavily overloaded servers) I noticed that the weighting of the build
  failure weighter is -1000000.0 * number of failures:

  https://github.com/openstack/nova/blob/master/nova/conf/scheduler.py#L495

  This means that a server which has any sort of build failure instantly
  drops to the bottom of the weighed list of hypervisors for scheduling
  of instances.

  Why might a instance fail to build? Could be a timeout due to load,
  might also be due to a bad image (one that won't actually boot under
  qemu).  This second cause could be triggered by an end user of the
  cloud inadvertently causing all instances to be pushed to a small
  subset of hypervisors (which is what I think happened in our case).

  This feels like quite a dangerous default to have given the potential
  to DOS hypervisors intentionally or otherwise.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: nova-scheduler 2:17.0.7-0ubuntu1
  ProcVersionSignature: Ubuntu 4.15.0-43.46-generic 4.15.18
  Uname: Linux 4.15.0-43-generic x86_64
  ApportVersion: 2.20.9-0ubuntu7.5
  Architecture: amd64
  Date: Fri Mar  1 13:57:39 2019
  NovaConf: Error: [Errno 13] Permission denied: '/etc/nova/nova.conf'
  PackageArchitecture: all
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: nova
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1818239/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list