[Bug 1818239] Re: scheduler: build failure high negative weighting
OpenStack Infra
1818239 at bugs.launchpad.net
Tue Mar 5 07:45:28 UTC 2019
Reviewed: https://review.openstack.org/640698
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=c5029e9831ab5063485877213987d6827c4d86f1
Submitter: Zuul
Branch: master
commit c5029e9831ab5063485877213987d6827c4d86f1
Author: James Page <james.page at ubuntu.com>
Date: Mon Mar 4 09:25:46 2019 +0000
Disable BuildFailureWeigher
Disable the BuildFailureWeigher used when weighting hosts during
instance scheduling. A single build failure will result in a
-1000000 weighting which effectively excludes the hypervisor
from the scheduling decision.
A bad image can result in build failures resulting in a heavy
load on hypervisors which have not had a build failure with
those that have effectively being ignored; the build failure
count will be reset on a successful build but due to the high
weighting this won't happen until all resources on known good
hypervisors have been completely consumed.
Change-Id: I4d4367ef20e2a20aee1e26d4a0ec69cad2ac69d6
Closes-Bug: 1818239
** Changed in: charm-nova-cloud-controller
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1818239
Title:
scheduler: build failure high negative weighting
Status in OpenStack nova-cloud-controller charm:
Fix Committed
Status in nova package in Ubuntu:
Won't Fix
Bug description:
Whilst debugging a Queens cloud which seems to be landing all new
instances on 3 out of 9 hypervisors (which resulted in three very
heavily overloaded servers) I noticed that the weighting of the build
failure weighter is -1000000.0 * number of failures:
https://github.com/openstack/nova/blob/master/nova/conf/scheduler.py#L495
This means that a server which has any sort of build failure instantly
drops to the bottom of the weighed list of hypervisors for scheduling
of instances.
Why might a instance fail to build? Could be a timeout due to load,
might also be due to a bad image (one that won't actually boot under
qemu). This second cause could be triggered by an end user of the
cloud inadvertently causing all instances to be pushed to a small
subset of hypervisors (which is what I think happened in our case).
This feels like quite a dangerous default to have given the potential
to DOS hypervisors intentionally or otherwise.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: nova-scheduler 2:17.0.7-0ubuntu1
ProcVersionSignature: Ubuntu 4.15.0-43.46-generic 4.15.18
Uname: Linux 4.15.0-43-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Fri Mar 1 13:57:39 2019
NovaConf: Error: [Errno 13] Permission denied: '/etc/nova/nova.conf'
PackageArchitecture: all
ProcEnviron:
TERM=screen-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
SourcePackage: nova
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1818239/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list