Tuesday, July 27, 2010

Goriak uploaded code to the public

Goriak a Go (golang) library for the Riak database. 

Got enough time to learn enough Go to code up a basic key/value interface to Riak. It is still missing a couple of the REST API calls which I plan to implement over the next couple weeks. I figured it had enough work on it to share with the world and hopefully some one else can review my code and contribute to it. I originally looked to just contribute to another project but the only project I could find was riak.go and it didn't have as complete an implementation as I had. I was able to use a couple lines of riak.go and it really helped me implementing the Put function so I am really glad it is there. Also the author is really nice he is just going off to work on something else so I decided to create a new repository for the project.

The current API has three functions:
GetBucket - Retrieves a Bucket type which contains all kinds of information about the bucket
Get - Marshals the JSON that is returned by the request into a the interface that is passed in. 
GetCAP - Same as get but support an option for the number of nodes that need to agree before returning
Put - Unmashals the type passed in and uploads it to the server
PutCAP - Same as Put but can specify the number of nodes that need to write the json
Delete - Removes the object from the cluster
DeleteCAP - Same as above but can specify number of nodes to delete it from 

I still need to implement MapReduce, LinkWalk, Ping and ServerStatus. These should come over the next couple weeks as I get time to spend on them. MapReduce being the only non trivial one.

I did start to experiment with some concurrent access patterns. The first I call BackgroundWait which can make Get or Put which return a request id. The responses are stuffed in a channel until the code calls Wait(requestId). Instead of waiting for every single request it can do many requests at the same time and just wait when required. This reduces the number of choke points. Assuming web applications do something like this:
  1. Check Authentication 
  2. Application configuration / setup that where user is not required
  3. Based on user / permissions additional data is requested
  4. Possible a second round of data based on application data
  5. Response returned
  6. Logging 
As the use of key/value and document databases increase I think this will be more important to maximize backend performance. Riak ability to scale up both for reading and writing allows for this type of access pattern similarly to memcached. I don't know of any RDMBS (maybe VoltDB?) that could handle a large number of concurrent requests since they have very constrained IO or process (thread) resources. It is a lot of fun to think about ways that increase IO distribution could mean potential new access patterns for low latency data access. 

Would be happy for any help, code review and contribution much appreciated.


Tuesday, July 20, 2010

Concurrently golang programming

I would guess like most developer my first exposure to concurrency was in my parallel programming class in college. MPI and PVM where rather painful libraries to C/C++ experiences to cute my teeth on but it was a very exciting and rewarding experience. When I started writing Java thread programs there was an uneasy sense that parallel programming couldn't be this easy (after all the locks and synchronous calls there wasn't many operations running in parallel) but it sure was a lot of fun. In Python I found threads to be crazy simple but only useful for IO wait and loving miltiprocess programming. Parallel Python was the only thing that actually provided the same true parallelism as the original MPI and PVM but was also amazingly simple. Goroutines (coroutines) make me never want to see thread programming again. The killer combination of channels (kinda like a queue in the thread world) and coroutines are really enjoyable to code compared to threads and queues.

Currently I am working on creating a library for golang to talk to riak. If multiple request need to be made then they can be done using goroutines. Thus reducing round trip wait time. Depending on IO limitations this could increase the speed of web pages that need multiple resource for a specific url. For things like memecache and very fast datastores this would work great. The downside is that potential overloaded IO situations would be made much worse by the increase number of connections. This would probably limit to nonblocking IO datasources which most new datasources can handle a large number of connections.