[Bug 1397130] [NEW] libvirt-bin crashes / refuses to restart if cgmanager is restarted
Don Bowman
don.waterloo+ubuntu at gmail.com
Thu Nov 27 22:42:27 UTC 2014
Public bug reported:
reference bug 1367702. As per request, opening new ticket w/
instructions to reproduce.
This is on 14.10 server, libvirt-bin 1.2.8-0ubuntu11.1
As per 1367702, this is not using LXC (which u used in your attempt).
This is running bare-metal, no container, no hypervisor. Each VM below
is started from OpenStack nova-compute (this node is a compute-only
node).
don at nubo-5:~$ sudo service cgmanager restart
cgmanager stop/waiting
cgmanager start/running, process 22588
don at nubo-5:~$ virsh list
Id Name State
----------------------------------------------------
2 instance-000015de running
3 instance-000015df running
4 instance-000015e0 running
5 instance-000015e1 running
6 instance-000015e2 running
7 instance-000015e3 running
8 instance-000015e4 running
9 instance-000015e5 running
10 instance-000015e6 running
11 instance-000015e7 running
12 instance-000015e8 running
13 instance-000015e9 running
14 instance-000015ea running
15 instance-000015eb running
16 instance-000015ec running
17 instance-000015ed running
18 instance-000015ee running
19 instance-000015ef running
20 instance-000015f0 running
21 instance-000015f1 running
22 instance-000015f2 running
23 instance-000015f3 running
24 instance-000015f4 running
25 instance-000015f5 running
26 instance-000015f6 running
27 instance-000015f7 running
28 instance-000015f8 running
29 instance-000015f9 running
30 instance-000015fa running
31 instance-000015fb running
32 instance-000015fc running
33 instance-000015fd running
34 instance-000015fe running
35 instance-000015ff running
36 instance-00001600 running
don at nubo-5:~$ sudo service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 22751
don at nubo-5:~$ virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer
If i then run libvirtd manually:
root at nubo-5:~# libvirtd -v
2014-11-27 22:38:18.066+0000: 26422: info : libvirt version: 1.2.8, package: 1.2.8-0ubuntu11.1
2014-11-27 22:38:18.066+0000: 26422: info : virNetlinkEventServiceStart:521 : starting netlink event service with protocol 0
2014-11-27 22:38:18.066+0000: 26422: info : virNetlinkEventServiceStart:521 : starting netlink event service with protocol 15
2014-11-27 22:38:18.073+0000: 26433: info : dnsmasqCapsSetFromBuffer:685 : dnsmasq version is 2.71, --bind-dynamic is present, SO_BINDTODEVICE is in use
2014-11-27 22:38:18.074+0000: 26433: info : networkReloadFirewallRules:1778 : Reloading iptables rules
2014-11-27 22:38:18.074+0000: 26433: info : networkRefreshDaemons:1750 : Refreshing network daemons
2014-11-27 22:38:18.198+0000: 26433: info : virFirewallApplyGroup:844 : Starting transaction for 0x7f15e40e7110 flags=0
2014-11-27 22:38:18.198+0000: 26433: info : virFirewallApplyRule:785 : Applying rule '/sbin/iptables --version'
2014-11-27 22:38:18.207+0000: 26433: info : libxlDriverShouldLoad:241 : Disabling driver as /proc/xen/capabilities does not exist
2014-11-27 22:38:18.250+0000: 26433: info : virDomainObjListLoadAllConfigs:18944 : Scanning for configs in /var/run/libvirt/qemu
2014-11-27 22:38:18.256+0000: 26433: info : virDomainObjListLoadAllConfigs:18968 : Loading config file 'instance-000015fd.xml'
...
2014-11-27 22:38:18.385+0000: 26441: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
process 26422: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.
Most likely, the application was supposed to call dbus_connection_close(), since this is a private connection.
2014-11-27 22:38:18.387+0000: 26439: warning : cg_detect_placement:561 : Failed to get cgroup path for cpu
2014-11-27 22:38:18.392+0000: 26445: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
cgm ping returns true, so cgmanager is presumably ok.
sometimes when doing the libvirtd -v manually it does a segfault instead of an assert.
sometimes the assertion is different:
(null):cgmanager-client.c:1015: Assertion failed in cgmanager_get_pid_cgroup_sync: proxy != NULL
Segmentation fault (core dumped)
or
(null):alloc.c:315: Assertion failed in nih_free: ptr != NULL
(null):alloc.c:315: Assertion failed in nih_free: ptr != NULL
Segmentation fault (core dumped)
---
ApportVersion: 2.14.7-0ubuntu8
Architecture: amd64
DistroRelease: Ubuntu 14.10
Package: libvirt (not installed)
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.16.0-25-generic root=UUID=a58668fa-f6db-4941-84eb-c89e102971e1 ro splash quiet vt.handoff=7
ProcEnviron:
LANGUAGE=en_CA:en
TERM=screen
PATH=(custom, no user)
LANG=en_CA.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 3.16.0-25.33-generic 3.16.7
Tags: utopic utopic
Uname: Linux 3.16.0-25-generic x86_64
UnreportableReason: The report belongs to a package that is not installed.
UpgradeStatus: Upgraded to utopic on 2014-10-19 (39 days ago)
UserGroups:
_MarkForUpload: True
modified.conffile..etc.apparmor.d.abstractions.libvirt.qemu: [modified]
modified.conffile..etc.apparmor.d.usr.sbin.libvirtd: [modified]
mtime.conffile..etc.apparmor.d.abstractions.libvirt.qemu: 2014-10-23T03:29:38.231519
mtime.conffile..etc.apparmor.d.usr.sbin.libvirtd: 2014-10-23T03:18:18.057906
** Affects: libvirt (Ubuntu)
Importance: Undecided
Status: New
** Tags: apport-collected utopic
** Tags added: apport-collected utopic
** Description changed:
reference bug 1367702. As per request, opening new ticket w/
instructions to reproduce.
This is on 14.10 server, libvirt-bin 1.2.8-0ubuntu11.1
As per 1367702, this is not using LXC (which u used in your attempt).
This is running bare-metal, no container, no hypervisor. Each VM below
is started from OpenStack nova-compute (this node is a compute-only
node).
don at nubo-5:~$ sudo service cgmanager restart
cgmanager stop/waiting
cgmanager start/running, process 22588
don at nubo-5:~$ virsh list
Id Name State
----------------------------------------------------
2 instance-000015de running
3 instance-000015df running
4 instance-000015e0 running
5 instance-000015e1 running
6 instance-000015e2 running
7 instance-000015e3 running
8 instance-000015e4 running
9 instance-000015e5 running
10 instance-000015e6 running
11 instance-000015e7 running
12 instance-000015e8 running
13 instance-000015e9 running
14 instance-000015ea running
15 instance-000015eb running
16 instance-000015ec running
17 instance-000015ed running
18 instance-000015ee running
19 instance-000015ef running
20 instance-000015f0 running
21 instance-000015f1 running
22 instance-000015f2 running
23 instance-000015f3 running
24 instance-000015f4 running
25 instance-000015f5 running
26 instance-000015f6 running
27 instance-000015f7 running
28 instance-000015f8 running
29 instance-000015f9 running
30 instance-000015fa running
31 instance-000015fb running
32 instance-000015fc running
33 instance-000015fd running
34 instance-000015fe running
35 instance-000015ff running
36 instance-00001600 running
don at nubo-5:~$ sudo service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 22751
don at nubo-5:~$ virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer
If i then run libvirtd manually:
root at nubo-5:~# libvirtd -v
2014-11-27 22:38:18.066+0000: 26422: info : libvirt version: 1.2.8, package: 1.2.8-0ubuntu11.1
2014-11-27 22:38:18.066+0000: 26422: info : virNetlinkEventServiceStart:521 : starting netlink event service with protocol 0
2014-11-27 22:38:18.066+0000: 26422: info : virNetlinkEventServiceStart:521 : starting netlink event service with protocol 15
2014-11-27 22:38:18.073+0000: 26433: info : dnsmasqCapsSetFromBuffer:685 : dnsmasq version is 2.71, --bind-dynamic is present, SO_BINDTODEVICE is in use
2014-11-27 22:38:18.074+0000: 26433: info : networkReloadFirewallRules:1778 : Reloading iptables rules
2014-11-27 22:38:18.074+0000: 26433: info : networkRefreshDaemons:1750 : Refreshing network daemons
2014-11-27 22:38:18.198+0000: 26433: info : virFirewallApplyGroup:844 : Starting transaction for 0x7f15e40e7110 flags=0
2014-11-27 22:38:18.198+0000: 26433: info : virFirewallApplyRule:785 : Applying rule '/sbin/iptables --version'
2014-11-27 22:38:18.207+0000: 26433: info : libxlDriverShouldLoad:241 : Disabling driver as /proc/xen/capabilities does not exist
2014-11-27 22:38:18.250+0000: 26433: info : virDomainObjListLoadAllConfigs:18944 : Scanning for configs in /var/run/libvirt/qemu
2014-11-27 22:38:18.256+0000: 26433: info : virDomainObjListLoadAllConfigs:18968 : Loading config file 'instance-000015fd.xml'
...
2014-11-27 22:38:18.385+0000: 26441: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
process 26422: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.
Most likely, the application was supposed to call dbus_connection_close(), since this is a private connection.
2014-11-27 22:38:18.387+0000: 26439: warning : cg_detect_placement:561 : Failed to get cgroup path for cpu
2014-11-27 22:38:18.392+0000: 26445: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
cgm ping returns true, so cgmanager is presumably ok.
sometimes when doing the libvirtd -v manually it does a segfault instead of an assert.
sometimes the assertion is different:
(null):cgmanager-client.c:1015: Assertion failed in cgmanager_get_pid_cgroup_sync: proxy != NULL
Segmentation fault (core dumped)
or
(null):alloc.c:315: Assertion failed in nih_free: ptr != NULL
(null):alloc.c:315: Assertion failed in nih_free: ptr != NULL
Segmentation fault (core dumped)
+ ---
+ ApportVersion: 2.14.7-0ubuntu8
+ Architecture: amd64
+ DistroRelease: Ubuntu 14.10
+ Package: libvirt (not installed)
+ ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.16.0-25-generic root=UUID=a58668fa-f6db-4941-84eb-c89e102971e1 ro splash quiet vt.handoff=7
+ ProcEnviron:
+ LANGUAGE=en_CA:en
+ TERM=screen
+ PATH=(custom, no user)
+ LANG=en_CA.UTF-8
+ SHELL=/bin/bash
+ ProcVersionSignature: Ubuntu 3.16.0-25.33-generic 3.16.7
+ Tags: utopic utopic
+ Uname: Linux 3.16.0-25-generic x86_64
+ UnreportableReason: The report belongs to a package that is not installed.
+ UpgradeStatus: Upgraded to utopic on 2014-10-19 (39 days ago)
+ UserGroups:
+
+ _MarkForUpload: True
+ modified.conffile..etc.apparmor.d.abstractions.libvirt.qemu: [modified]
+ modified.conffile..etc.apparmor.d.usr.sbin.libvirtd: [modified]
+ mtime.conffile..etc.apparmor.d.abstractions.libvirt.qemu: 2014-10-23T03:29:38.231519
+ mtime.conffile..etc.apparmor.d.usr.sbin.libvirtd: 2014-10-23T03:18:18.057906
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to libvirt in Ubuntu.
https://bugs.launchpad.net/bugs/1397130
Title:
libvirt-bin crashes / refuses to restart if cgmanager is restarted
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1397130/+subscriptions
More information about the Ubuntu-server-bugs
mailing list