Sunday, June 12, 2011

Startup Cloud Cluster on a Budget


Recently I have been working on a project that needed a highly available service. Since I have been pitching “the three pillars of a service: robust, fast and accessible”. With all the marketing hype around cloud computing and choices from something like Google AppEngine that provides a very restricted environment to something like Amazon AWS which just provides virtual hardware. The problem is that creating a service from scratch but I can’t live with the restrictions of something like AppEngine or Heruku nor do I want to spend a ton of time setting up something like AWS and Cassandra. This is a two part blog on setting up a robust, fast and accessible service in a few hours and for a very affordable price.

This blog will cover the concepts and overview of the solution and the technologies used for the implementation. The second blog will cover the configuration and deployment.

3-legged Table

I have come to the belief that for any service to compete it has to be supported by a three legged table where the legs are Robust, Fast and Accessible. Without any one of these the table is only valuable to produce heat in the fireplace.

Robust

Web systems should not have “scheduled downtime”. I think that the convergence of two things have force us web developers to architect for 0 scheduled downtime. The first is to provide a 24/7 business platform for our users. The second is after launch the software is alive with continues updates until it gets turned off. Traditional business systems ran during business hours and then could be “maintained” from 5pm pacific time to 8am eastern time. There is a general move to a continuous updates of software where the more frequently the updates the more stable the system is and faster the technology can adjust to the business need (I am a fan).

Fast

Usage can be directly correlated to the performance of a service. In a competitive market slow services will not be able to compete. Premature optimization has been the road to ruin of many a project but the ability to quickly optimize the service after launch will decide the success of the product.

Accessible

The point of a service is for it to be access by users. One of the reasons I think that JSON became so popular is it simplicity. Again in competitive markets the barrier to entry approaches zero. This mean well document services that support 80% of the platforms (mobile, desktop ect) is a requirement to compete in the marketplace.

Tools

Linux

This is a given for they types of service that I build. I could see a specific service that was for M$ that required using libraries that only work with Windows however I don’t image easy would be a part of that setup. Specifically I use Ubuntu however the configuration files / deploy script could very quickly be changed to use any Linux / BSD distribution.

Fabric

Fabric is a Python tool for doing system administrative tasks. There are a lot of tools that do this most are pretty specific I find this tool provides the best balance between easy of use and flexibility. The problem I have with shell scripts is they tend to be great to start with but quickly find the limits when needing to do some string manipulation.

Nginx

This is my web server of choice at the moment. I don’t think this matters so much but it provides very simple and quick configuration which is key the the easy part of setting up a cloud. Ultimately most web service is going to run proxied by something like this that will do all the gzip compressing and static file serving ect.

MongoDB

Mongo is one of those new fangled database under the NoSQL (not only SQL). I think it provides the most general purpose and easiest to setup. The console is very easy to manage data. However for this exercise the critical part was the two line configuration change difference between master and slave. This made failover very simple. I think that redis would have also been as easy and the configuration probably as simple but I liked having some of mongo’s mixtures of traditional and modern features like having indexes and map/reduce support.

Architecture

In this case I am using a load balancer (http://www.rackspace.com/cloud/cloud_hosting_products/loadbalancers/technology/) which I am not exact sure the specifics but they suggest only need to configure one load balancer it is automagically high availability. We also assume that the client that is talking to mongo has the proper settings that it will automatically roll over to the slave if the master goes down. There is not panacea we still could have a data center failure (as happend recently with Amazon cloud) so we might want to locate a slave replica in another data center ect. However I wanted to maintain a balance between highly available, easy to maintain, fast to setup and low cost.

Summary

My goal to create a starter cluster or cloud infustructure that focused on highly available, setup in an hour, easy to maintain and most critically low cost solution. I have published a basically template code (https://bitbucket.org/lateefj/easy_cloud) which I will highlight the configuration in my next blog. This example setup that I have uses two virtual servers plus you would have to manually configure the load balancer (took me 2 minutes) but the base price is around 35$ a month which is mind blowingly low cost in time and money for what you get (YMMV this is the minimum it can cost with basically 0 bandwidth usage). I will be hosting all my projects soon on this type of setup soon so I can sleep better at night.https://bitbucket.org/lateefj/easy_cloud