Propagating state change to multiple peers

Adam Gandelman adamg at
Wed Jul 13 20:10:32 UTC 2011


I've begun trying to deploy Openstack swift via ensemble and I've run 
into an issue I do not know how to tackle.  If functionality already 
exists in ensemble, I'd appreciate any pointers.  If not, maybe this 
would be a good time to brainstorm.

To configure swift's replication, you configure "rings" which setup 
storage nodes in various zones, and sets policy as to how replication 
should work.  When configuration changes (a new node or zone is added or 
removed), configuration needs to be updated (the rings need to be 
"balanced") and propagated to all nodes in the cluster.  The 
re-balancing of the rings takes place centrally, and updated 
configuration copied to corresponding nodes.

In terms of deploying via ensemble, I would like to have a central 
swift-proxy node that manages ring configuration.  When a new storage 
node joins, it relates to swift-proxy. swift-proxy updates the ring and 
the new storage node receives updated configuration [1], presumably via 
relation-changed hook.   This works fine when its a 1-to-1 relation.  
But when I begin adding more new swift storage nodes, the rings need to 
be balanced for each new member and new configuration propagated to 
*all* nodes relating to swift-proxy, not only the new node.  It seems 
ensemble needs to have some notion of global state/relation changes that 
fire corresponding hooks on the central server and all its peers.

Perhaps a global-relation-changed and global-relation-joined hooks that 
fire in addition to the current relation-changed/joined hooks?  The 
global hooks can be skipped if they do not exist, or perhaps they do not 
need to be fired by default and are instead triggered from within 
another hook?

Swift is the first use case I've run into that requires state to be 
synchronized between many nodes.  I think others will run into similar 
issues when using ensemble to deploy other multi-node clustered services.



[1] What gets copied to other nodes are several gzip archives that are 
generated when the rings get re-balanced.  We'll need a way to pass 
files between nodes similar to how KEY=VALUEs are sent via 
relation-set/relation-get, but thats topic for another thread.

More information about the Ensemble mailing list