Wednesday, April 11, 2012

Part 2 Dart vs Go vs Python (and PyPy) Performance

UPDATE: In the comments Will pointed out I had a bug in the Python code so updated 22:00 EST sorry.
UPDATE 2: Anonymous posted a more comparable version of the Go (http://pastie.org/3772555) so I updated the numbers 04-12-2012 09:00 EST (sorry). Please let me know if there are any other bug fixes!

The previous post I wrote Surprising performance from Dart inspired me to do some more test runs with Dart, Go, Python and PyPy. For round two of some simple performance comparisons between  Dart, Go, Python and PyPy I decided to do some basic file reading and processing. Since the vast majority of data that I process is JSON I decided to generate some random JSON data and store it into a file. Using each language read the entire file and for each line parse the JSON and aggregate it. Assuming I wrote the code right (big assumption) then I can get an idea which language is best for processing large sums of JSON.

The code basically is broken down into a module (package) that contains an Aggregator class (type) and the main file which reads from the data file. The hosted repository is on bitbucket.
Dart code:
I was able to code this using async style programming which in theory would maximize for IO performance if Dart supported async IO. I was a bit shaky at first with the terse syntax of the onLine callback but I am getting a bit ok with it.  The duplicate use of StringInputStream seems a bit wordy, probably could have just said "var stream" instead of "StringInputStream stream" to increase readability (Go got this right with the := syntax). Otherwise the code is very straight forward.

Since Dart is design for the browser and it is a preview the IO would be very slow but I expected it to be able to process JSON very fast since that is a large part of what a modern web application does. This also means that the StringInputStream should be pretty fast since it will be the main source of data for the application. Go would probably outperform Python and not really sure about Python vs PyPy. I figured PyPy would use more memory but be faster for doing aggregation.

Runtime comparison:




This is only 10,000 JSON documents in a single file. I have no idea how it does it but Python just kills it. Not that surprising that it is 18x faster than Dart but 2x (roughly) faster than Go is pretty amazing I knew the Python 2.x IO was fast but I didn't realize it was this fast. I think there could be a faster way to process the JSON in Go it just didn't jump out at me to do it without using the Unamarshal.

Memory comparison:


I expected Go to have used the least memory but didn't expect it to be so tiny (2,076, after update Python is 4,692). Dart was unusable but that is expected in a preview release. PyPy is not much better I guess I need to send them a note and ask them what they think. Python again rock star here.

I have chosen to kick the tires at a pretty high level but I think it is a realistic situation since web apps these days are processing a lot of JSON. Dart faired as expected for a preview release. Python rock star status as usual. PyPy we will see what happens when I contact them about it. Finally Go was very disappointing how much runtime it took to process the JSON, however it did recover some dignity by it exceptional use of memory.

Thursday, April 5, 2012

Surprising performance from Dart

About every two years I try to learn a new programming language. I find the new language opens up a new set of applications and I can use it to improve existing applications. Dart this year mainly because I have been writing a lot of GWT (Java -> Javascript) and Javascript over the last couple years. They both have a lot going for them and against them. From what I have read about Dart it takes the best parts from both.
 GWT Complaints:
  • Costly project setup and maintenance
  • Java development is slow
  • Language and library incompatibilities

 Trying to setup and maintain a GWT application is a project in itself. From what I have used of Maven it seems to save little time if any. Then there are a lot of choices of design vs performance and cross browser support that are pretty loose / loose.
Java development is slow. Yep I said it and many say it isn't but it is. The JVM is extremely fast at doing a lot of things unfortunately it seems all development type things like IDE (GUI), compilation and building use excessive amounts of CPU and memory in the JVM. It is jaw dropping to see a hello world REST service take up the better part of 100MB (jar dependencies mostly from maven). Reloading a page using GWT (any servlet for that matter) requires a recompile step that is slow and bottom line a momentum killer.
GWT has support for calling into other Javascript libraries but it is not trivial. A lot of productivity is lost not being able to add a jquery plugin to a page in 10 seconds.

Where GWT kills it is deployment. The end product (assuming you can get it built) is a fully optimized cross platform wonderfulness that would take massive amounts of time to get equivalent Javascript app built.

Javascript Complaints:
  • Hard to manage large code bases
  • Browser incompatibilities
Trying to manage a large Javascript code base extremely painful. After a couple files it gets messy and slow. Keeping related code in separate files is really nice however it has an impact that those files have to get loaded into the browser unless you are using some build system. Then there is a lot of decorating of pages that only need X amount of code vs another page which needs a different set. There are ways but namespace seems like something that should be built into the language like almost every other programming language.
Debugging IE is what my generation of developers will be telling war stories over someday I suppose. However I feel like it is time to move on. Over a decade of doing it is time to innovate beyond these kind of shenanigans.  
The great part about Javascript is it is extremely fast to develop. I don't even have to download the framework I can just add a script tag to the page and reload the page. This is really great for getting products built quickly or adding features quickly. 

Dart:
Static type safe compilation as an option for those really large code bases (libraries anyone). However also the ability to just reload the browser for fast development. New language but with very simular syntax as Javascript that compiles to Javascript to support legacy browsers? Maybe the best of both worlds plus a bit more ontop?
With most programming languages I play with I start with Fibonacci number sequence as the first program. I am a fan of Fibonacci and it is a very simple algorithm to get the basics of a language. So lets take a look at the code:
Dart fib code:
fib(int n) {
  if (n == 1) {
    return 1;
  } else if (n == 0) {
    return 0;
  } else {
    return fib(n-1) + fib(n-2);
  }
}

  main() {
    var length = 35;
    for (int i = 0; i < length; i++) {
      print(fib(i));
    }
  }

I decided to benchmark compare Dart with Python, PyPy and Go run (go run fib.go)  which compiles and runs the the program and then compiled it to see it fully optimized.




Then I compared memory utilization:

Fat disclaimer about the benchmark is that this is basically a number crunching thing that I ran a couple times to take these numbers for what they are (primitive). PyPy being way faster than Python is not a surprise at all. Lets take Python out because it is is skewing the chart.


The first really big surprise was the Dart was faster than PyPy. Another big surprise was how much memory PyPy was using like 5X (JIT maybe that is why Java is such a memory hog). Dart had amazing performance and memory utilization for a preview release of a language. The compiled version of Go had amazingly low memory utilization which lines up with my experience running a web app written in Go.  This is very primitive experiment so I think I will try something next time with a little more complexity to see if CPU and memory hold up with a little bit of pounding. This is exciting news because I think Dart can be not only a front end web language but also a backend language. Gives me ideas!

Update: Part 2