cs240a hw0
chris grzegorczyk
grze@cs.ucsb.edu
Novel Platforms for Molecular Dynamics: GPUs and the PS3's Cell Processor
Application
Novel use and deployment of the Cell Processors and GPUs for Molecular Dynamics by the Stanford Project Folding@home [1]

Figure 1. A molecular simulation is displayed concurrently with the actual computation of the simulation as enabled by the RSX graphics processor and 8 computation Cells on the PS3.
What is the scientific or engineering problem being solved?
Folding@home is a distributed application which aims to provide an effecient platform for simulating the folding of proteins. The problem of simulating the molecular interactions which occur during the folding of a protein is computationally demanding, and conventional approaches for molecules on the scale of a protein would require on the order of 30 CPU years to for a single simulation. The novel approach deployed recently by the folding@home project is the use of GPUs and the Cell Processor in the Sony PlayStation 3 to perform certain classes of computations needed for folding simulation. While at first glance, folding@home is a distributed application, both the GPU and Cell implementations of folding@home exploit the hardware level parallelism enabled by multiple shader pipelines and processing Cells in the GPUs and Cell, respectively.
How well did the application achieve its scientific / engineering objective?
The folding@home project publishes the effective TFLOPS produced by each class of participating processors [2]. At the time of this viewing the numbers appear as:
| OS Type | Current TFLOPS* | Active CPUs | Total CPUs |
| Windows | 173 | 181669 | 1658942 |
| Mac OS X/PowerPC | 8 | 9997 | 97107 |
| Mac OS X/Intel | 13 | 4204 | 10028 |
| Linux | 46 | 27139 | 220459 |
| GPU | 57 | 966 | 2685 |
| PLAYSTATION®3 | 262 | 20009 | 82983 |
| Total | 559 | 243984 | 2072204 |
Examining the numbers we can see that each GPU client is effectively producing around 60GFLOPS, each PS3 client is producing 13GFLOPS, and each windows CPU less than 1GFLOPS. This is a stunning result considering the relatively low cost of the hardware being used.
What type of parallel platform was the application developed for?
In [3] and [4] are described the architecture of the GPU and Cell being exploited by the most recent versions of folding@home. The GPU implementation exploits the fast shader pipelines (of which there are 48!) to perform floating point operations, effectively behaving like a limited shared memory system with 48 fpus. The PS3 has a complex new CPU, the Cell Processor, which is general purpose (i.e., not limited in its applicability as the GPU is). The Cell Processor is comprised of what is essentially a main CPU called the Power Processing Element (PPE) which is a 3.2Ghz two-way Power 970 capable of 25.6GFLOPS single precision. Coupled with the PPE are 8 Synergistic Processing Elements (SPE) each effectively a primitive and greatly limited, but largely parallel CPU. The SPE has only 256K of memory with which to communicate with main memory. While not exactly a cache, since its behaviour is not automatic, the memory can be thought of as a working set not unlike a cache. Given this limitation, the high level of parallelism of the SPE is surprising. During each clock cycle the processor is able to operate on 16 8-bit integers, 8 16-bit integers, 4 32-bit integers, or 4 single precision floating-point numbers.
How well did the application perform? How does this compare to the platform's best possible performance?
As we can see the PS3 is not reaching its full potential but performs very well with respect to its theoretical maximum.
Does the application "scale" to large problems on many processors?
Because of the nature of the computation and methods which have been developed to perform molecular simulations the implementation is embarassingly parallel and scales in way which is nearly linear.
References
[1] http://folding.stanford.edu/
[2] http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats
[3] http://folding.stanford.edu/FAQ-PS3.html
[4] http://folding.stanford.edu/FAQ-ATI.html