[Bug 1563089] Re: Memory Leak when new cluster configuration is formed.
Jorge Niedbalski
1563089 at bugs.launchpad.net
Wed Mar 30 20:03:00 UTC 2016
** Description changed:
[Environment]
Trusty 14.04.3
Packages:
ii corosync 2.3.3-1ubuntu1 amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync-common4 2.3.3-1ubuntu1 amd64 Standards-based cluster framework, common library
[Reproducer]
-
1) I deployed an HA environment using this bundle (http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
with a 3 nodes installation of cinder related to an HACluster subordinate unit.
$ juju-deployer -c next-ha.yaml -w 600 trusty-kilo
2) I changed the default corosync transport mode to unicast.
$ juju set cinder-hacluster corosync_transport=udpu
3) I assured that the 3 units were quorated
- cinder/0# corosync-quorumtool
+ cinder/0# corosync-quorumtool
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
- Quorum: 2
- Flags: Quorate
+ Quorum: 2
+ Flags: Quorate
Membership information
----------------------
- Nodeid Votes Name
- 1002 1 10.5.1.57 (local)
- 1001 1 10.5.1.58
- 1000 1 10.5.1.59
+ Nodeid Votes Name
+ 1002 1 10.5.1.57 (local)
+ 1001 1 10.5.1.58
+ 1000 1 10.5.1.59
The primary unit was holding the VIP resource 10.5.105.1/16
- root at juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
+ root at juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem state UP group default qlen 1000
- link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
- inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
- valid_lft forever preferred_lft forever
- inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
- valid_lft forever preferred_lft forever
+ link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
+ inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
+ valid_lft forever preferred_lft forever
+ inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
+ valid_lft forever preferred_lft forever
4) I manually added a TC queue for the eth0 interface on the node
holding the VIP resource, introducing a 350 ms delay.
$ sudo tc qdisc add dev eth0 root netem delay 350ms
- 5) Right after adding the 350ms on the cinder/0 unit, the corosync process informs that one of the processors failed, and is forming a new
+ 5) Right after adding the 350ms on the cinder/0 unit, the corosync process informs that one of the processors failed, and is forming a new
cluster configuration.
-
+
Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]: [TOTEM ] A processor failed, forming new configuration.
Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]: [TOTEM ] A new membership (10.5.1.57:11628) was formed. Members
Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]: [QUORUM] Members[3]: 1002 1001 1000
Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]: [MAIN ] Completed service synchronization, ready to provide service.
This happens on all of the units.
6) After receiving this message, I remove the queue from eth0:
$ sudo tc qdisk del dev eth0 root netem
Then, the following statement is written in the master node:
Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]: [TOTEM ] A new membership (10.5.1.57:11628) was formed. Members
Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]: [QUORUM] Members[3]: 1002 1001 1000
Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]: [MAIN ] Completed service synchronization, ready to provide service.
-
- 7) While executing 5 and 6 repeatedly, I ran the following command to track the SZ and RSS memory usage of the
+ 7) While executing 5 and 6 repeatedly, I ran the following command to track the VSZ and RSS memory usage of the
corosync process:
root at juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root netem delay 350ms
root at juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root netem
- $ sudo while true; do ps -o sz,rss -p $(pgrep corosync) 2>&1 | grep -E
+ $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep -E
'.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done
- The results shows that both sz and rss are increased over time at a high
- ratio.
+ The results shows that both vsz and rss are increased over time at a
+ high ratio.
25476 4036
... (after 5 minutes).
135644 10352
[Fix]
- So preliminary based on this reproducer, I think that this commit (https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
+ So preliminary based on this reproducer, I think that this commit (https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
is a good candidate to be backported in Ubuntu Trusty.
[Test Case]
* See reproducer
[Backport Impact]
* Not identified
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089
Title:
Memory Leak when new cluster configuration is formed.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+subscriptions
More information about the Ubuntu-server-bugs
mailing list