Tuesday, May 15, 2007

Guido, threads and thank you Parallel Python

Parallel Python is a little known library that is utterly amazing. If some of you have been following the ongoing Python GIL drama. Guido just made a statement about thread support in Python 3000 (the next release of Python). I was once one of the brain washed thinking that threads and Java where the solution to all multitasking issues. Then I had one, two, three oh my gosh I am in deadlock explosion. Yes those days of the JBoss deadlock exception where really just horrible. As the "Enterprise" community came up with more solutions to fatten an already bloated container it became embarrassingly clear that threads where not the answer to a heavily loaded we application.

The Past
I don't claim to be a very bright programmer and apparently every time I tried to figure out a way with all the performance hacks I had into (or caching/ other libraries I was using) to figure out how to cluster the threads I couldn't. The only solution was to buy a bigger machine. I then spent some time on another project who was using another vendor who's name starts with a 'B'. Thinking well they are big money product that I am sure has figured out how to cluster these threads containers... Boy was I wrong they hadn't really figured out how to do it either, sure they supported a cluster but the problem was you had code like you where not in a threaded environment. Ok so if the solution is to code like your not in a threaded environment then what is the point of using threads? Clearly only the uber "coders" should be coding Java and threads and the rest of us can stick to our simpler, faster to develop, maintain, run and cluster Python (add dynamic language here).
When I returned to the project it was time to oust JBoss, Java and all the threading nightmares it put on us. So we created a custom Python framework. Performance was up 10x, with about 3x faster development. It was clear to me then that this threading thing was for the birds.

The Problem
So now that we know using processes instead of threads is better. I may not have presented a great argument but I have learned my lesson. Now that threads are out I had been doing lots of process programming but hadn't yet really developed any applications that communicated back and forth. So now I needed a producer consumer and I had no idea how to write it. I quickly realized why threads where so common they where an easy. Then I found Parallel Python! Which makes it painfully easy to take existing code and processing using multiple processes, on multiple hosts. I dynamically sent a function to another host in about 3 lines of code!

0 comments:

Post a Comment