Friday, March 9, 2012

Added Secondary Index support to goriak

Have been pretty heads down on writing business software and was getting a little burnt. So to revive I hacked a little on the driver I have been playing with for Go (golang) to use Riak. There are basically 3 motivations other than I need to do a little fun hacking to recharge the batteries.
  1. Go weekly is getting close to version 1 so they added packaging using the "go command" which I cover in a previous blog.
  2. Riak add secondary index which is so much more developer friendly than before.
  3. The types of problems I really enjoy solving are ones that Go and Riak would be good tools for.
Quick recap is that I had the basic support for key/value store with a hint to the type to structure to marshal it would do the conversion of struct to json and back. Get, Put and Delete basically was all that was implemented at the time. 
Secondary Indexes provide a binary (string) and integer types for indexing. I implemented exact match and range queries. So method signatures:

SearchBin(bucketName, index, value string) ([]string, error) 

SearchBinRange(bucketName, index string, start, end string) ([]string, error) 

SearchInt(bucketName, index string, value int) ([]string, error) 

SearchIntRange(bucketName, index string, start, end int) ([]string, error) 

Now there is a complete example included with the source code:




Time to billed that kill web service! Source code is on bitbucket: https://bitbucket.org/lateefj/goriak/

Friday, March 2, 2012

Presentation on Error Handling with event sourcing

Slides:https://docs.google.com/presentation/d/1Xs-Q1ZkiDW9wolk3vYqqyq9aBpdImv_xzYxlJYfKxzQ/edit#slide=id.p
In my ever growing desire to be able to automate capturing all errors and being able to validate fixing them I put together a demo and some simple code https://github.com/lateefj/cltpy_chat. Basically it captures all events and marks session that have a strack trace error in them so that they can be later found and rerun on the system. The idea being some day to be able to capture all events and then rerun them agains the code that has fixed them to prove that the fix will work. I am hoping to come out with some tools for Javascript and Python to be able to easily automate this.

Tuesday, February 21, 2012

golang go command & dependency management

Finally decent dependency management!
The elephant in the room has always been that the code depends on specific versions of software to build and or run. Once you accept this putting dependencies in the source code is very obvious. A quick comparison of the languages I use most:

Javascript:
Bottom line javascript doesn't really do any dependency management that is left for the most part up to the developer to make sure that all the required packages are included on the page or using some framework. There seems to be some magic possible with coffeescript, GWT or other web frameworks that provide dependency management however are not interoperable. This is nasty as web apps get really large. I am looking to Dart in the future to solve this.

Java:
The only real game in town is Maven. The XML file configuration is horrendous IMO especially compared to Ant. The enormous amount of jars and memory needed to compile a simple web service is surprising. Finally the support for maven repository is far from universal.

Python:
Probably the best with a combination of virtualenvwrapper, pip and running pip freeze this is by far the fastest and easiest to use until go get command came along. There is still manual tracking of dependencies where even if it is simple "pip freeze > deps.txt".

Go:
This is covered pretty completely here (caution for the sensitive the blog is pretty graphic). Bottom line if I have added dependency I run "go get" and they are automatically downloaded from the import statements in my source code. I can run "go build" without any Makefile or build script. Once I has pushed up my library into the interwebs and another library is added to the go world.

This forces the master/default/trunk of the repo to be production. New code must be developed in a branch but isn't that how it should be anyway? The dependency (pom.xml, pip freeze, ect) configuration is commit into the source tree anyway (usually with some additional instructions). Lets hope other languages start supporting this!

Thursday, February 9, 2012

Event Sourcing

Event Sourcing: Bingo! I have started to experiment by building applications. Now that NoSQL and cloud storage solution allow for basically unlimited storage capacity this is very accesible option. Event Sourcing could be added an application that is using an RDBMS. Storing the current state in and RDBMS or any indexed data store (in my case MongoDB with some tuned indexes). One of the main goals I have it to track all the UI (Javascript) events so that any error I can replay everything that happens to debug it. I have done this before in a game but never hooked in the error replay so can't wait to see how it works.

Tuesday, December 6, 2011

InfoQ: Events Are Not Just for Notifications

InfoQ: Events Are Not Just for Notifications:

It is nice to see a fantastic explanation of what I think of as Event Programming. I have blog about a way to implement a simple Javascript dispatcher and implementing a cross platform event system. And have been chomping at the bit to implement a fully audited application that stores all the events in one of these new document databases (MongoDB, Riak). To be able to mark any user session with an error (or even allow them to report an error) and then to "replay" that users session would be pretty hot. The concept is difficult but implementation is so much easier especially testing. Now to just evangelize until it is considered a "Best Practice".

Thursday, October 20, 2011

Presentation on Cloud Deployment & Mongodb @ Charlotte Python Group

A few weeks back I have a presentation on using Fabric to deploy Python WSGI application in the cloud.  The slides are:https://docs.google.com/present/view?id=dg3v9j47_210cxt77g7h
The code from the demo is here: https://github.com/lateefj/RTC-example
In the presentation I displayed how taking down one node in the cluster would not interrupt the other two nodes. Using fabric this was not only feasible but relatively easy once you understand the architecture. I also review the tools that I used like supervisor to restart WSGI (Flask), fabric for deployment and servers setup and finally mongodb for data storage.

I also quickly introduced mongodb: https://docs.google.com/present/view?id=dg3v9j47_212g2kvq2tv
The main point I was trying to make was that in early development RDBMS strict schema just get in the way of fast development. It is a great tool for prototyping. I use it anytime at the beginning of a project because I don't know the final data model. Also my past experience is that every mature application that has an RDBMS is that the schema is totally crazy to work with. So why not start with a very flexible schema and give it some room to evolve as the application matures.

Sunday, June 12, 2011

Startup Cloud Cluster on a Budget


Recently I have been working on a project that needed a highly available service. Since I have been pitching “the three pillars of a service: robust, fast and accessible”. With all the marketing hype around cloud computing and choices from something like Google AppEngine that provides a very restricted environment to something like Amazon AWS which just provides virtual hardware. The problem is that creating a service from scratch but I can’t live with the restrictions of something like AppEngine or Heruku nor do I want to spend a ton of time setting up something like AWS and Cassandra. This is a two part blog on setting up a robust, fast and accessible service in a few hours and for a very affordable price.

This blog will cover the concepts and overview of the solution and the technologies used for the implementation. The second blog will cover the configuration and deployment.

3-legged Table

I have come to the belief that for any service to compete it has to be supported by a three legged table where the legs are Robust, Fast and Accessible. Without any one of these the table is only valuable to produce heat in the fireplace.

Robust

Web systems should not have “scheduled downtime”. I think that the convergence of two things have force us web developers to architect for 0 scheduled downtime. The first is to provide a 24/7 business platform for our users. The second is after launch the software is alive with continues updates until it gets turned off. Traditional business systems ran during business hours and then could be “maintained” from 5pm pacific time to 8am eastern time. There is a general move to a continuous updates of software where the more frequently the updates the more stable the system is and faster the technology can adjust to the business need (I am a fan).

Fast

Usage can be directly correlated to the performance of a service. In a competitive market slow services will not be able to compete. Premature optimization has been the road to ruin of many a project but the ability to quickly optimize the service after launch will decide the success of the product.

Accessible

The point of a service is for it to be access by users. One of the reasons I think that JSON became so popular is it simplicity. Again in competitive markets the barrier to entry approaches zero. This mean well document services that support 80% of the platforms (mobile, desktop ect) is a requirement to compete in the marketplace.

Tools

Linux

This is a given for they types of service that I build. I could see a specific service that was for M$ that required using libraries that only work with Windows however I don’t image easy would be a part of that setup. Specifically I use Ubuntu however the configuration files / deploy script could very quickly be changed to use any Linux / BSD distribution.

Fabric

Fabric is a Python tool for doing system administrative tasks. There are a lot of tools that do this most are pretty specific I find this tool provides the best balance between easy of use and flexibility. The problem I have with shell scripts is they tend to be great to start with but quickly find the limits when needing to do some string manipulation.

Nginx

This is my web server of choice at the moment. I don’t think this matters so much but it provides very simple and quick configuration which is key the the easy part of setting up a cloud. Ultimately most web service is going to run proxied by something like this that will do all the gzip compressing and static file serving ect.

MongoDB

Mongo is one of those new fangled database under the NoSQL (not only SQL). I think it provides the most general purpose and easiest to setup. The console is very easy to manage data. However for this exercise the critical part was the two line configuration change difference between master and slave. This made failover very simple. I think that redis would have also been as easy and the configuration probably as simple but I liked having some of mongo’s mixtures of traditional and modern features like having indexes and map/reduce support.

Architecture

In this case I am using a load balancer (http://www.rackspace.com/cloud/cloud_hosting_products/loadbalancers/technology/) which I am not exact sure the specifics but they suggest only need to configure one load balancer it is automagically high availability. We also assume that the client that is talking to mongo has the proper settings that it will automatically roll over to the slave if the master goes down. There is not panacea we still could have a data center failure (as happend recently with Amazon cloud) so we might want to locate a slave replica in another data center ect. However I wanted to maintain a balance between highly available, easy to maintain, fast to setup and low cost.

Summary

My goal to create a starter cluster or cloud infustructure that focused on highly available, setup in an hour, easy to maintain and most critically low cost solution. I have published a basically template code (https://bitbucket.org/lateefj/easy_cloud) which I will highlight the configuration in my next blog. This example setup that I have uses two virtual servers plus you would have to manually configure the load balancer (took me 2 minutes) but the base price is around 35$ a month which is mind blowingly low cost in time and money for what you get (YMMV this is the minimum it can cost with basically 0 bandwidth usage). I will be hosting all my projects soon on this type of setup soon so I can sleep better at night.https://bitbucket.org/lateefj/easy_cloud