Self Assembling Nodes for Elastic Compute Resource Resolution

So lets set this up, as a simple problem statement:

You have an elastic compute grid, which allows you to add/remove resources at will. In general, as you add capacity by adding things like MySQL instances, memcache, redis, etc resource you want to let your various tiers get notified of these additions/subtractions such that as you add resources your applications can sort of just “hang out” and wait for the resources they need to become available. You dont want to have to manually configure IPs, Ports, DNS, etc. Simply “hot-swap” these resources in and feel assured that (N) or more clients can find available services out there in the wild.

SOA had the concept of a bus and discovery, which usually goes something like either (A) use a single registry to perform lookup/routing or (B) use UDP broadcasts to let your applications “self-discover” their environment.

I am now a huge fan of puppet and nodejs along with zeromq and as such I believe the solution is made much simplier thanks to epgm/pgm protocols.

Lets take a look at same nodejs code…

Infrastructure subscriber

So with those lines of code you have now rest assured that when infrastructure is added with the specified tag (can be any string) that you will be able to pick it up and do something with it (IE: configure it).

Infrastructure publisher

Given this simple and effective setup as you add nodes to your Amazon EC2, rackspace, whatever instance provided you are within BROADCAST range you should be able to pick up instances dynamically, perform an operation, and then wait for those instances to disappear or go-offline, and re-act accordingly.

Use Case

As a user I have the option of hosting many SOLR instances for a BIG data solution. One of the problems with managing a large number of SOLR instances is the complexity involved in infrastructure. In order to create shards and distributed queries I need to update my solr.config files with all solr instances that are available to ensure my queries cover the entire spectrum. However, as I add or remove nodes in my cloud I want this configuration to stay in-sync. Normally this requires automation but I dont want to have a human typing in IP addresses or requiring a complex management structure to handle the complexity. What I want is the ability to drop in solr instances and pluck them out and have it elastically adjust and compensate.

How the solution maps

With the proposed solution we now have a means such that as SOLR instances get added we can invoke a puppet process to update our SOLR configuration (using FACTER). The moment a change is detected we update FACTER with the latest information needed, and then execute PUPPET, which will then do the required re-configuration/uninstallation/etc tasks.

But what about zookeeper?

The latest SOLR trunk (4.0) includes zookeeper for this kind of thing, but this introduces complexity as we need leader election, fail-over for zookeeper farm, and have a central point of contention. While this may be good for some instances, in other instances where the farm itself could be upwards of 100 servers (100’s of billions of SOLR docs) we want to handle outages gracefully.

Using this approach we can also have our instances geographically dispersed by creating a “forwarder” that can listen to a separate network and receive the same pub/sub updates.

Looking forward to open-sourcing this to github as time goes on.


Leave a Reply

Your email address will not be published. Required fields are marked *