Wednesday, April 22, 2009

CouchDB First impressions

For a while I have been wanting to have a different approach to database storage for a while. A good while ago I blog about object databases and mentioned taking a look at CouchDB. There has been something that has not been sitting right with me for a while about the relational model.
As a Python programmer I spend a lot of my time iterator of list, doing dictionary lookups to find and manipulate data. All the languages I have seen or written in for web are doing this. The ORM's save us time by maximizing our mental time thinking in programming language of choice. However eventually we end up switching our brain into relational mode so we can do some complex query or schema design.
Unbind my mind from relations please!
One thing that I have noticed is that relational model and how I access and store information into it changes the way I think about the information. When discussing with other developers applications I tend to start describing features like a SELECT statement (Getting all list of widgets FROM animal, primate WHERE ...). This bothers me that instead of the flexibility and easy of use of my programming language I am constrained to think about the limitation of my storage system. Ideally I would like to discussing features and functionally without any constraints.
Everything is simply upside down
After getting over the lack of schema definition which takes all of about 2 seconds when I realized I can get down to writing code right away instead of trying to constrain my ideas to the data model that I have developed. Simply create an class definition with typed attributes and you are off and running. What seems like a no brainer' is a couple of new types which are arrays and dictionaries. Not only does this seem like total no brainer' since just about every programming language these days has built int array and dictionary types. Even though Postgresql has an array type it is very hard to work with.
Then the problem is when it comes to the JOIN. Where relational database really are useful is joining tables of which the better be couse these days it seems with any non trivial system you have to join 5 tables to get anything useful from the system. I find this rather painful. Beause I spend a lot of mental energy trying to query data and filter it out when it would be much easier for my to just say "for r in data". It would be much easier to traverse all the records and filter them that way. This is one of the feature I love about couchDB you visit every record and can add javascript to filter or transform the data returned. WOW the feedom.
Documents are not tables
CouchDB also uses "Document" which takes a bit of getting use to if your like me and have been thinking about tables for the last decade. First documents fields can be typed but there is not strict schema that they have to conform to. What is nice is that if needed can just push new code supporting new fields without DDL changes. WOOT! I suppose flexability is what I like most about this. I don't want to constantly change schema during the process of developing a new application or new features on an exisitng one. I have to wipe the data every time I change the schema or data model.
Document as model we don't normalize they same way relational. This is hard to change my mind from thinking in database normal form to a document that holds data. So far it has been hard to get use to this natual fit. There are still relationships between documents but the volume of the relationships is greatly reduced. Example in a relational database we would have animal->primate->monkey as tables. In contrast this would all be represented in a single document.
Pain in the JOIN, maybe
You can't do joins the same way you do them in a RDBMS. However you can do joins. It has taken me a while to bend my mind on how to do this. It isn't complex it just hard to change the way I think about it. I have only just started to write my first joins in a simple application I am working on and I like it so far. The difference is that all types of documents are returned instead of single rows with lots of duplicate data. For example if you join the city with the state table you get [Charlotte, NC], [Concord, NC] where 'NC' is duplicate state. Where as in in couchdb we would get [NC, Charlotte, Concord, GA, Atlanta..]. On one hand going to have to deal with the different documents in the result set however on the other hand there is a lot less data and the time savings in trying to figure out the 15 table joins will more than make up for handling more complex result set.