Wednesday, December 30, 2009

Is RDBMS pain boil down to premature optimization?

As I extract my self from stored procedure pondering how at the door step of 2010 we are still using such an inflexible, complex and surprising language to manage data? The stored procedure with its PERFORM syntax instead of SELECT when you are calling a stored procedure inside of a stored procedure and the stored procedure you are calling does not return anything. There are so many of these type of rules that 30 years ago would have felt natural. Today this is a waste of developer time. Postgresql provide many different languages but this sounds much better than it is in practice. What is really bothering me why is this so painful? Why have RDBMS not matured to make our tasks easier and why hasn't a better data storage technology come about?

In this blog I am going to explore some of the issue that I have seen with RDBMS's. All of them seem to come back to premature optimization.

Data consistency is all that matters!
This is a common argument why all data store must be ACID compliant. I have heard my co-workers laugh about how NoSQL databases store inconsistent data. This seem ironic in a way because I have read most of their code and writing a lot of code that uses ACID RDBMS and that data can easily be placed in an inconsistent state. This is for performance reasons that we do this. When I was writing EJB's and each data record locked for reading and writing this was a very consistent way to access data. The problem with this is no data can be read or writing in parallel. It is amazing how few users it takes to bring a web system to a halt that is truly consistent. In the web applications that I have written the make heavy use of caches like memcache which I am sure has created many inconstant user views of data. I can think of off the top of my head over my 10 years of coding web applications, only 3 instances where I have writing been involved or seen code that actually was consistent. It might just be me but I would guess most web app developers have had a similar experience.

Disk speed is slow?
Disk have only recently started to increase in speed with SSD coming out that are multiple times faster then the spindles. Has this been the reason we have lived with RDBMS for so long. We have not had fast enough disk to use something else? We have been able to add more disks and use raid to increase performance however with RDBMS this has come down to writing to two nodes. First writing to the master and it replicating to a minimum of one slave. But wait we can scale RDBMS simply but sharding! Except we have broken rule #1 ACID if we shard data. So if data access performance will ever exceed disk array or size exceeds a disk array then RDBMS.

Object are not relations but they store sooooo close until they don't.
As a developer this seem to be something that is constantly driving me crazy. The relational data model never exactly maps to objects of the applications so it is 75% of the time consistent between tables and objects. So we design the tables first and our objects end up looking like are schema until performance suffers really bad and then we end up changing the table schema and the objects don't make any sense. Only the most trivial applications applications does this work out but I don't tend to spend much time writing or maintaining trivial applications.

RDBMS Vendor + DBA all your problems are solved
I can agree that for specific problems specific venders have design good solutions for them however this is very rare . As for the DBA since I sell myself as a knowledgeable person I agree more knowledgeable people to help solve difficult problems is a very good approach however the problem is the data access is slow! No matter how many people you have there is only so fast a single node can write to a disk array. No matter what if the system grows though more users, more data or more features the system will get slower if it can not do parallel writes.

So maybe I am just "ranting about writing software" but I think it is time to start making the transition to better data storage systems. My future productivity depends on using data stores that get out of the way and let me use modern languages and concept to manage the data. For the situations that I need to have consistent data I will use ACID RDBMS or find a way to create consistent data in NoSQL data stores.