[Bug 619712] [NEW] keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0 but perhaps all?)

Arjan 619712 at bugs.launchpad.net
Wed Aug 18 10:05:41 BST 2010


*** This bug is a security vulnerability ***

Public security bug reported:

Binary package hint: keepalived

Hello,

About a keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0)
and probably all other versions.

Impact: possible complete/partial datacenter disruptions due to gratious
arp/multicast floods


The situation to reproduce the race condition

The race condition got triggered in our situation with 1 cluster of 4
machines, sharing  3 (vlan) interfaces.

Reason for doing doing so, not really importand, using vrrp and redundant
cluster (2x2) for changing the default gateway, in case of a lease line
failure.


The most basic setup, and easyest way to trigger is a subset of above
-3 machines, with keepalived
-2 interfaces/machine, defined in keepalived as vrrp_instance
-1 vrrp_sync_group, with those interfaces included

some key settings:

machine1:
	initial state: state BACKUP
	int 1, prio 250
	int 2, prio 250
machine2:
	initial state: state BACKUP
	int 1, prio 200
	int 2, prio 200
machine3:
	initial state: state MASTER
	int 1, prio 150
	int 2, prio 150


Machine3 is set to master, just to trigger the race condition, but in
initial state BACKUP, a connection failure (without a link down) will do the
same, which is the situation we encountered.


WARNING: below may/will cause network disruption, so be prepared.

The way to trigger, is having started keepalived on machine 1 & 2, where
machine 1 gently becomes a master as meant/configured.

Now, start machine3, and here it happens, as soon machine sends a single
vrrp message (as it want's to be master), machine 1 & 2 get's crazy(haywire), and
come in a race condition which will never stop, until you stop one of
machine 1 or 2. Stopping machine 3 will definitly NOT stop the race.


A sample of the relevant logging of the race:

machine 1:
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election


machine 2:
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state


After some review of the keepalived code, and analyzed traces, and
researching the situation, i came to a fix of the issue.  Be aware i've
never looked at the keepalived code before, so perhaps, there may be a more
ultimate fix.

This patch prevents sending another vrrp message in the
vrrp_sync.c: vrrp_sync_master_election() when already in the isync->wantstate is in a
VRRP_STATE_GOTO_MASTER  state.

old code in vrrp_sync.c:
void
vrrp_sync_master_election(vrrp_rt * vrrp)
{
        vrrp_rt *isync;
        vrrp_sgroup *vgroup = vrrp->sync;
        list l = vgroup->index_list;
        element e;

        if (vrrp->wantstate != VRRP_STATE_GOTO_MASTER)
                return;
        if (GROUP_STATE(vgroup) == VRRP_STATE_FAULT)
                return;

        log_message(LOG_INFO, "VRRP_Group(%s) Transition to MASTER state",
               GROUP_NAME(vgroup));

        /* Perform sync index */
        for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
                isync = ELEMENT_DATA(e);
                if (isync != vrrp) {
                        /* Force a new protocol master election */
                        isync->wantstate = VRRP_STATE_GOTO_MASTER;
                        log_message(LOG_INFO,
                               "VRRP_Instance(%s) forcing a new MASTER election",
                               isync->iname);
                        vrrp_send_adv(isync, isync->effective_priority);
                }
        }
}


patched code in vrrp_sync.c:
void
vrrp_sync_master_election(vrrp_rt * vrrp)
{
        vrrp_rt *isync;
        vrrp_sgroup *vgroup = vrrp->sync;
        list l = vgroup->index_list;
        element e;

        if (vrrp->wantstate != VRRP_STATE_GOTO_MASTER)
                return;
        if (GROUP_STATE(vgroup) == VRRP_STATE_FAULT)
                return;

        log_message(LOG_INFO, "VRRP_Group(%s) Transition to MASTER state",
               GROUP_NAME(vgroup));

        /* Perform sync index */
        for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
                isync = ELEMENT_DATA(e);
                if (isync != vrrp) {
                        if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) {     // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or more shared interfaces in 1 group 
                        /* Force a new protocol master election */
                        isync->wantstate = VRRP_STATE_GOTO_MASTER;
                        log_message(LOG_INFO,
                               "VRRP_Instance(%s) forcing a new MASTER election",
                               isync->iname);
                        vrrp_send_adv(isync, isync->effective_priority);
                        }
                }
        }
}


diff keepalived-1.1.17:

# diff  -u vrrp_sync.c.orig_ vrrp_sync.c
--- vrrp_sync.c.orig_	2010-08-17 14:12:30.944634215 +0200
+++ vrrp_sync.c	2010-08-17 14:15:05.964634624 +0200
@@ -157,12 +157,14 @@
 	for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
 		isync = ELEMENT_DATA(e);
 		if (isync != vrrp) {
+			if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) {	// Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or more shared interfaces in 1 group 
 			/* Force a new protocol master election */
 			isync->wantstate = VRRP_STATE_GOTO_MASTER;
 			log_message(LOG_INFO,
 			       "VRRP_Instance(%s) forcing a new MASTER election",
 			       isync->iname);
 			vrrp_send_adv(isync, isync->effective_priority);
+			}
 		}
 	}
 }


diff keepalived-1.2.0:

# diff -u vrrp_sync.c.orig_ vrrp_sync.c
--- vrrp_sync.c.orig_	2010-08-18 08:17:52.766910325 +0200
+++ vrrp_sync.c	2010-08-18 08:20:23.216968929 +0200
@@ -155,12 +155,14 @@
 	for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
 		isync = ELEMENT_DATA(e);
 		if (isync != vrrp) {
+		        if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) {     // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or more shared interfaces in 1 group
 			/* Force a new protocol master election */
 			isync->wantstate = VRRP_STATE_GOTO_MASTER;
 			log_message(LOG_INFO,
 			       "VRRP_Instance(%s) forcing a new MASTER election",
 			       isync->iname);
 			vrrp_send_adv(isync, isync->effective_priority);
+			}
 		}
 	}
 }



This solves our issue, and we did some basic testing with good results, but it may be wise an
experienced keepalived developer should have a look at it too.


PS: marked this as a security vulnerability as it _may_ bring a system/datacenter
down, by just a single (vrrp) packet.
Posted this bugreport as well on the keepalived-devel at lists.sourceforge.net ML.


Regards,

Arjan Filius
iafilius at xs4all.nl

** Affects: keepalived (Ubuntu)
     Importance: Undecided
         Status: New

** Visibility changed to: Public

-- 
keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0 but perhaps all?)
https://bugs.launchpad.net/bugs/619712
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to keepalived in ubuntu.



More information about the Ubuntu-server-bugs mailing list