Wednesday, April 11, 2012

Part 2 Dart vs Go vs Python (and PyPy) Performance

UPDATE: In the comments Will pointed out I had a bug in the Python code so updated 22:00 EST sorry.
UPDATE 2: Anonymous posted a more comparable version of the Go (http://pastie.org/3772555) so I updated the numbers 04-12-2012 09:00 EST (sorry). Please let me know if there are any other bug fixes!

The previous post I wrote Surprising performance from Dart inspired me to do some more test runs with Dart, Go, Python and PyPy. For round two of some simple performance comparisons between  Dart, Go, Python and PyPy I decided to do some basic file reading and processing. Since the vast majority of data that I process is JSON I decided to generate some random JSON data and store it into a file. Using each language read the entire file and for each line parse the JSON and aggregate it. Assuming I wrote the code right (big assumption) then I can get an idea which language is best for processing large sums of JSON.

The code basically is broken down into a module (package) that contains an Aggregator class (type) and the main file which reads from the data file. The hosted repository is on bitbucket.
Dart code:
I was able to code this using async style programming which in theory would maximize for IO performance if Dart supported async IO. I was a bit shaky at first with the terse syntax of the onLine callback but I am getting a bit ok with it.  The duplicate use of StringInputStream seems a bit wordy, probably could have just said "var stream" instead of "StringInputStream stream" to increase readability (Go got this right with the := syntax). Otherwise the code is very straight forward.

Since Dart is design for the browser and it is a preview the IO would be very slow but I expected it to be able to process JSON very fast since that is a large part of what a modern web application does. This also means that the StringInputStream should be pretty fast since it will be the main source of data for the application. Go would probably outperform Python and not really sure about Python vs PyPy. I figured PyPy would use more memory but be faster for doing aggregation.

Runtime comparison:




This is only 10,000 JSON documents in a single file. I have no idea how it does it but Python just kills it. Not that surprising that it is 18x faster than Dart but 2x (roughly) faster than Go is pretty amazing I knew the Python 2.x IO was fast but I didn't realize it was this fast. I think there could be a faster way to process the JSON in Go it just didn't jump out at me to do it without using the Unamarshal.

Memory comparison:

I expected Go to have used the least memory but didn't expect it to be so tiny (2,076, after update Python is 4,692). Dart was unusable but that is expected in a preview release. PyPy is not much better I guess I need to send them a note and ask them what they think. Python again rock star here.

I have chosen to kick the tires at a pretty high level but I think it is a realistic situation since web apps these days are processing a lot of JSON. Dart faired as expected for a preview release. Python rock star status as usual. PyPy we will see what happens when I contact them about it. Finally Go was very disappointing how much runtime it took to process the JSON, however it did recover some dignity by it exceptional use of memory.