Tuesday, July 27, 2010

Goriak uploaded code to the public

Goriak a Go (golang) library for the Riak database. 

Got enough time to learn enough Go to code up a basic key/value interface to Riak. It is still missing a couple of the REST API calls which I plan to implement over the next couple weeks. I figured it had enough work on it to share with the world and hopefully some one else can review my code and contribute to it. I originally looked to just contribute to another project but the only project I could find was riak.go and it didn't have as complete an implementation as I had. I was able to use a couple lines of riak.go and it really helped me implementing the Put function so I am really glad it is there. Also the author is really nice he is just going off to work on something else so I decided to create a new repository for the project.

The current API has three functions:
GetBucket - Retrieves a Bucket type which contains all kinds of information about the bucket
Get - Marshals the JSON that is returned by the request into a the interface that is passed in. 
GetCAP - Same as get but support an option for the number of nodes that need to agree before returning
Put - Unmashals the type passed in and uploads it to the server
PutCAP - Same as Put but can specify the number of nodes that need to write the json
Delete - Removes the object from the cluster
DeleteCAP - Same as above but can specify number of nodes to delete it from 

I still need to implement MapReduce, LinkWalk, Ping and ServerStatus. These should come over the next couple weeks as I get time to spend on them. MapReduce being the only non trivial one.

I did start to experiment with some concurrent access patterns. The first I call BackgroundWait which can make Get or Put which return a request id. The responses are stuffed in a channel until the code calls Wait(requestId). Instead of waiting for every single request it can do many requests at the same time and just wait when required. This reduces the number of choke points. Assuming web applications do something like this:
  1. Check Authentication 
  2. Application configuration / setup that where user is not required
  3. Based on user / permissions additional data is requested
  4. Possible a second round of data based on application data
  5. Response returned
  6. Logging 
As the use of key/value and document databases increase I think this will be more important to maximize backend performance. Riak ability to scale up both for reading and writing allows for this type of access pattern similarly to memcached. I don't know of any RDMBS (maybe VoltDB?) that could handle a large number of concurrent requests since they have very constrained IO or process (thread) resources. It is a lot of fun to think about ways that increase IO distribution could mean potential new access patterns for low latency data access. 

Would be happy for any help, code review and contribution much appreciated.