[Bug 938699] [NEW] nova-compute runaway memory allocation
Andrew Glen-Young
938699 at bugs.launchpad.net
Wed Feb 22 14:04:28 UTC 2012
Public bug reported:
This morning I upgraded my Openstack hosts to the latest packaged
version of the nova packages. After the upgrade some of the compute node
machines were rather unresponsive.
After investigation it became apparent that nova-compute was nearly
exhausting the node's memory. I captured as much data as I dared before
having to terminate the process.
The symptoms seem to have gone away after restarting nova-compute again.
This occured on three of five compute nodes, but I was unable to
reproduce (yet) after restarting nova-compute.
$ ps axfuwww | grep compute
nova 17606 0.0 0.0 48040 2220 ? Ss 13:19 0:00 su -s /bin/sh -c exec nova-compute --flagfile=/etc/nova/nova.conf --flagfile=/etc/nova/nova-compute.conf nova
nova 17607 4.3 48.6 14240940 12000856 ? Dl 13:19 0:13 \_ /usr/bin/python /usr/bin/nova-compute --flagfile=/etc/nova/nova.conf --flagfile=/etc/nova/nova-compute.conf
$ sudo strace -p 17607
Process 17607 attached - interrupt to quit
brk(0x22db06000) = 0x22db06000
brk(0x22db27000) = 0x22db27000
brk(0x22db48000) = 0x22db48000
[...]
brk(0x22e7c9000) = 0x22e7c9000
brk(0x22e7ea000) = 0x22e7ea000
brk(0x22e80b000) = 0x22e80b000
^CProcess 17607 detached
/var/log/nova/nova-compute.log:
[...]
2012-02-22 13:21:12,543 DEBUG nova.compute.manager [-] image_id=e34cdc7d-bfce-4064-a80f-3916cfd65ea8, image_size_bytes=1476395008, allowed_size_bytes=10737418240 from (pid=17607) _check_image_size /usr/lib/python2.7/dist-packages/nova/compute/manager.py:488
2012-02-22 13:21:12,544 AUDIT nova.compute.manager [req-ba92fb06-884f-48f1-89eb-e03b2bb68df3 evand evand_project] instance 761fdfeb-40bb-497e-9fc8-687f4291644b: starting...
2012-02-22 13:21:12,628 DEBUG nova.rpc.common [-] Making asynchronous call on network ... from (pid=17607) multicall /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:318
2012-02-22 13:21:12,629 DEBUG nova.rpc.common [-] MSG_ID is 8f75c505d29a4128a04244f6549a9e75 from (pid=17607) multicall /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:321
2012-02-22 13:21:12,629 DEBUG nova.rpc.common [-] Pool creating new connection from (pid=17607) create /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:57
2012-02-22 13:21:12,633 DEBUG amqplib [-] Start from server, version: 8.0, properties: {u'information': u'Licensed under the MPL. See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2011 VMware, Inc.', u'capabilities': {}, u'platform': u'Erlang/OTP', u'version': u'2.6.1'}, mechanisms: [u'PLAIN', u'AMQPLAIN'], locales: [u'en_US'] from (pid=17607) _start /usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py:661
2012-02-22 13:21:12,634 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2012-02-22 13:21:12,635 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2012-02-22 13:21:12,636 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2012-02-22 13:21:12,637 DEBUG nova.compute.manager [-] deallocating network for instance: 94fbd530-96d3-4d74-a202-1091820edcc4 from (pid=17607) _deallocate_network /usr/lib/python2.7/dist-packages/nova/compute/manager.py:579
2012-02-22 13:21:12,638 DEBUG nova.rpc.common [-] Making asynchronous cast on network... from (pid=17607) cast /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:343
2012-02-22 13:21:12,640 DEBUG amqplib [-] Open OK! known_hosts [] from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py:507
2012-02-22 13:21:12,640 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2012-02-22 13:21:12,641 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2012-02-22 13:21:12,642 INFO nova.rpc.common [-] Connected to AMQP server on 10.55.58.1:5672
2012-02-22 13:21:12,643 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2012-02-22 13:21:12,644 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2012-02-22 13:21:12,645 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2012-02-22 13:21:12,646 DEBUG nova.compute.manager [-] Checking state of 94fbd530-96d3-4d74-a202-1091820edcc4 from (pid=17607) _get_power_state /usr/lib/python2.7/dist-packages/nova/compute/manager.py:251
2012-02-22 13:21:14,302 INFO nova.virt.libvirt.firewall [-] [instance: 94fbd530-96d3-4d74-a202-1091820edcc4] Attempted to unfilter instance which is not filtered
2012-02-22 13:21:14,302 INFO nova.virt.libvirt.connection [-] [instance: 94fbd530-96d3-4d74-a202-1091820edcc4] Deleting instance files /srv/nova/instances/instance-0000152f
2012-02-22 13:21:15,283 INFO nova.virt.libvirt.connection [-] [instance: 94fbd530-96d3-4d74-a202-1091820edcc4] Instance destroyed successfully.
2012-02-22 13:21:15,286 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2012-02-22 13:21:15,287 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2012-02-22 13:21:15,288 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2012-02-22 13:21:15,288 DEBUG nova.compute.manager [-] deallocating network for instance: 761fdfeb-40bb-497e-9fc8-687f4291644b from (pid=17607) _deallocate_network /usr/lib/python2.7/dist-packages/nova/compute/manager.py:579
2012-02-22 13:21:15,289 DEBUG nova.rpc.common [-] Making asynchronous cast on network... from (pid=17607) cast /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:343
2012-02-22 13:21:15,290 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2012-02-22 13:21:15,291 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2012-02-22 13:21:15,292 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2012-02-22 13:21:15,292 DEBUG nova.compute.manager [-] Checking state of 761fdfeb-40bb-497e-9fc8-687f4291644b from (pid=17607) _get_power_state /usr/lib/python2.7/dist-packages/nova/compute/manager.py:251
2012-02-22 13:21:16,987 INFO nova.virt.libvirt.firewall [-] [instance: 761fdfeb-40bb-497e-9fc8-687f4291644b] Attempted to unfilter instance which is not filtered
2012-02-22 13:21:16,987 INFO nova.virt.libvirt.connection [-] [instance: 761fdfeb-40bb-497e-9fc8-687f4291644b] Deleting instance files /srv/nova/instances/instance-0000154c
2012-02-22 13:21:17,918 INFO nova.virt.libvirt.connection [-] [instance: 761fdfeb-40bb-497e-9fc8-687f4291644b] Instance destroyed successfully.
2012-02-22 13:21:17,918 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2012-02-22 13:21:17,919 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2012-02-22 13:21:17,920 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2012-02-22 13:21:17,921 DEBUG nova.compute.manager [-] deallocating network for instance: 94fbd530-96d3-4d74-a202-1091820edcc4 from (pid=17607) _deallocate_network /usr/lib/python2.7/dist-packages/nova/compute/manager.py:579
2012-02-22 13:21:17,922 DEBUG nova.rpc.common [-] Making asynchronous cast on network... from (pid=17607) cast /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:343
2012-02-22 13:21:17,924 DEBUG amqplib [-] Closed channel #1 from (pid=17607) _do_close /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:90
2012-02-22 13:21:17,925 DEBUG amqplib [-] using channel_id: 1 from (pid=17607) __init__ /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:70
2012-02-22 13:21:17,926 DEBUG amqplib [-] Channel open from (pid=17607) _open_ok /usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py:484
2012-02-22 13:21:17,926 DEBUG nova.compute.manager [-] Checking state of 94fbd530-96d3-4d74-a202-1091820edcc4 from (pid=17607) _get_power_state /usr/lib/python2.7/dist-packages/nova/compute/manager.py:251
2012-02-22 13:21:19,680 INFO nova.virt.libvirt.firewall [-] [instance: 94fbd530-96d3-4d74-a202-1091820edcc4] Attempted to unfilter instance which is not filtered
2012-02-22 13:21:19,681 INFO nova.virt.libvirt.connection [-] [instance: 94fbd530-96d3-4d74-a202-1091820edcc4] Deleting instance files /srv/nova/instances/instance-0000152f
^^^ The log was not being updated with additional information while
nova-compute was running away
$ dpkg-query --show nova-compute
nova-compute 2012.1~e4~20120217.12709-0ubuntu1
$ cat /etc/lsb-release·
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu precise (development branch)"
Please let me know if I can help any further?
** Affects: nova (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/938699
Title:
nova-compute runaway memory allocation
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/938699/+subscriptions
More information about the Ubuntu-server-bugs
mailing list