Brute Force Rendering Using Google App Engine

January 6th, 2011

A few months ago I was feeding my internet addiction on programming.reddit and stumbled upon Kevin Beason's smallpt, a global illumination renderer in 99 lines of c++. He uses Monte Carlo path tracing. In this approach, the radiance at each pixel is estimated by taking the average of many independent samples. Producing good visual results is slow (compared with other approaches) because quadrupling the number of samples only reduces the error by half. However, the solution is "embarrassingly parallel" so it can be sped up substantially if you have the horsepower to throw at it.

Around the same time I discovered smallpt I was learning about Google App Engine. App Engine lets you write web apps that run on Google's infrastructure. It is designed to automagically scale as traffic increases, spawning new instances of an app that run in parallel. The natural question was, could smallpt be ported to App Engine? How much of a speedup was possible?

smallpt is clearly CPU bound and App Engine has a max rate limit of 72 CPU-minutes/minute. This basically means the equivalent of 72 1.2 GHz processors can be used at once. Beason's highest quality example took 10.3 hrs to render at 1024x768 using a 2.4 GHz quad core. A rough estimate of the potential speedup was (72 * 1.2) / (4 * 2.4) = 9x. This was a bit less than I had hoped for (I wanted a lightning fast render farm!), but I decided to move forward anyway.

My first step was to port smallpt from c++ to python. This turned out to be a mistake. My implementation was about 300x slower than the c++ version. I'm sure there was plenty of opportunity for improvement. I'm a python novice, I didn't check that I was following "best practices" for performance, and I didn't try running it on an optimized interpreter (I think Google uses one). However, after taking a look at language benchmarks, I wasn't convinced I could obtain the performance I wanted. I decided to switch to Java. Much to my surprise, my Java implementation performed nearly as well as the c++ version! Next, I built a web interface for telling app instances to trace a specific section of the image. At this point, my time and interest shifted to other projects. Rather than let the source code slowly be forgotten on my hard drive I'm releasing it. Maybe it will serve as a reference or starting point for others. Proceed at your own risk.

- Send multiple HTTP requests at once from the computer coordinating the rendering distribution (NIO?)
- App instances need to track their runtime to prevent going over the 30 second request limit
- App instances need to track their cpu usage to prevent going over the 72 CPU-minutes/minute limit (it's not guaranteed that all machines will be 1.2 GHz)
- Each App Engine account supports up to 10 unique apps. It's against the rules to have multiple apps that act as one. If you want to briefly "bend the rules" for learning purposes, you could try creating 10 copies of the same thing. This would have a potential 90x speedup. (I'm kind of curious if Google has a way to detect when someone creates identical or near identical copies of the same app.)
- If you really want to break the rules, you could try creating multiple App Engine accounts that each have 10 copies of the app. This would be a bit challenging because the signup process requires a unique cell phone number. Google sends you a registration code via SMS. You might be able to get around this by purchasing a few phones numbers using twilio @ $1/each. The second challenge is that getting the full 72 CPU-minutes/minute requires enabling billing which requires a Google Checkout account. Will Google allow you to use the same credit card for multiple Checkout accounts?
- Optimization pass
- Standards/documentation/best practices pass