Friday, April 16, 2010

Cloud application (AppEngine) or server (Rackspace, amazon)?

Flexibility vs Productivity vs Creativity
Starting out building an application the thing we don't know is the path the application is going to go. Like the CAP theorom that states pick two of the three options (Constancy, Availability, Partition Tolerance) we have to pick two of Flexibility, Productivity and Creativity. 

Take Hardware Off The Table
We have enough to worry about with all the laptops, monitors and cell phones but add rack of servers and we are really not happy. These type of resources are also very cost prohibitive especially for a startup or poorly funded project. Physical servers, switches, battery backups, facility (with special sprinkler system), security systems on physical access to equipment, power redundancy, network redundancy and capacity are all things that are enjoyable engineering tasks but a complete distraction from focusing on the project.

Hosted Server Overhead
Amazon user experience for managing servers is horrible so I am only going to talk about what I think the viable option is something like Rackspace. Come down to 3 things:
  • Setup - still need to manage configuration and setup of all the services that are running database, web servers ect. Even a small deployments can get complex quickly and require lots of resources.
  • Deployment - getting code actually onto the servers (this is harder than it sounds especial with web servers, databases, message queues ect)
  • Scaling - If your application is at all useful then people will want to use it and so you will need to figure out how to add more web servers, databases (RDBMS big piles of pain and expensive consultants licking there lips)
Hosted Application Constraints
AppEngine since that is the most popular and I have and am currently using I will point out some of the issues. 
  •  Development and testing can only use a very limited sample data set because of the way the development environment is. 
  • Simple and easy crojob that taskqueue is not. (probably going to end up using cron hitting a url anyway)
  • It is very expensive when the traffic hits!
  • Limited on which software libraries can use!
How about Hybrid
What constraints that AppEngine places will limit creativity and flexibility however is very productive. Rackspace allows for maximizing flexibility and creativity on tools to solve the problem but puts a large burden on productivity. A hybrid approach might be exactly what the doctor ordered. Starting out with AppEngine and migrating to a solution like Rackspace as the needs arise to be more flexible and creative.

Wednesday, April 7, 2010

Everyone codes!

At Pycon 2010 the last day Keynote (video here) was given by Antonio Rodriguez and I really dug his presentation. Since then I have been thinking about how to enable everyone in an organization to write code. It seem obvious that us coders / managers would want to enable everyone in the organization to have the tools to solve thier own problems.

So like how?
Well there are some technologies/methods that I think are converging that in combination could make it much easier to develop features, reports ect in.

Bespin
First thing we need is an editor and environment for all these new developers in our organizations. Reasons I think Bespin is the right solution for this:
  • Browser based: so don't have to set up client(s) on X machine
  • Extensible: Seem like it is pretty easy to extend to support what ever custom needs of setup there might be
  • Hosted: If it is running in the cloud and need more instances to support more users easy. Development environment can be managed centrally. Maybe even on demand?
  • Sharing: Code review, training and help can be giving though the tools provided.
Simple Data Model
Normalization is not the friend of a simple data model. This came to me because I designed an address book many years ago and recently I was trying to explain this to someone who could solve their own problems. So a table of user that has a contact where a contact has a list of email address. So to get a users email address need to write a join with 3 tables. Sure could create a view for this, but then when they find an issue how will they modify the record?
Document databases with map/reduce I think lends itself to creating simpler data models. Because:

  • Most computer users think of databases as documents. There is probably more data in spreadsheets than in RDBMS. So this will be comfortable for them.
  • Document database model tend to be less normalized and thus not having to traverse many documents to find things like email addresses. (Side note that Riak links are a darn good way to traverse documents)
  • Map/Reduce is simple and easy to understand conceptually. SQL is a mind bending to switch from a modern programming language like Javascript or Python to one from the 70's. 
Better API's
There is a big difference when writing an API for internal development team vs for a third party. My memory is not very good so a couple months down the road I have to read way to much code to use my own API's. It has come to me because I was writing the original code just for core developers. 

A question of language?
I would like to say that Python is the best choice. I know it would have the fastest learning curve but there are some downsides. I think right now Javascript has a leg up for this type of implementation.
Javascript
  • Upside:
    • Web is the unifying platform and Javascript is the language of that platform.
    • Map/Reduce javascript is supported in Mongo, CouchDB, Riak which are the document databases that come to mind for me.
  • Downside:
    • Is not an easy language to quickly learn how to use and play with. Would need to use something like Rhino.
    • Small community of backend development.
Python
  • Upside:
    • Very easy to learn and experiment with in terminal.
    • Has lots of web backend supported libraries.
  • Downside:
    • Map/Reduce with Python would need to be something like ZODB which has a lot of other pitfalls. 
    • Would need to use pyjamas or something like it for front end. As I have been writing a lot of GWT (in Java) this is not a short learning curve.

Maybe I am underestimating these new coders and they can handle a mixture of Javascript and Python just fine.

Dreaming
So my pie in the sky for this solution would be a very scalable database like Riak with a Python map/reduce plus a WSGI Python and finally Javascript with jQuery. Python would cover most of the use cases (reporting, data management) and for the intern or advanced non developer could write Javascript to create a better end user experience. Now just need the time and money to create such a solution.


Tuesday, April 6, 2010

Riak Presentation at Charlotte Ruby meeting

Saw an great introduction to Riak tonight. It is really an amazing product that is moving very fast. There where some surprises that I didn't realize that came along with the decentralized model. One of which is the inability to do averages. It is also very impressive how robust and potentially scalable Riak is. After the presentation I asked some of the more common questions and found out some things.

Pre and post commit hooks
Sounds like they are close to actually adding a pre and post commit hook. In RDBMS we would call these triggers. This came up when I was asking about making indexes easier to manage.

Distributed Indexes
I had read somewhere that they had been working on full index searches that worked just like all the other data stored in Riak. So I asked a bit more if that would also support regular indexes on that data so that performance could be greatly improved. The answer was it is in development but it will be a while before a production release is ready.

Very exciting stuff. I am hoping to continue learning Riak and using it with my Python and Javascript projects.