Evlan: Parallelization

Parallelization

Last updated at 2005/04/19 05:51:41 PDT by Temporal

Automatic Parallelization

Most of today's languages -- particularly imperative languages -- are designed around a fairly limited concept of computer architecture. Specifically, they tend to assume that the computer contains a single CPU which performs operations in sequence very, very fast. Although this architecture has worked well for decades, we are rapidly approaching the theoretical limits in the speed of a transistor. In order for CPUs to continue gaining speed, software needs to become much more parallel.

To better illustrate what I am getting at, take the following C function:

void processAll(struct Result* results[],
   const struct Item* items[], int size)
{
   for(int i = 0; i < size; i++)
      results[i] = processItem(item[i]);
}

This looks like a simple function. All it does is instruct the system to call the function processItem() on every item in the input array and write the results to another array.

There is a major problem, however: The above code instructs the system to call processItem() on each item in order, from first to last. It does not give the system any room to process two items in parallel or to process items in a different order, both of which could increase performance on some systems. Unless processItem() is an inline function, the compiler cannot make either of these assumptions, no matter how good the optimizer may be.

So, how would you rewrite the above function to clarify that items can be processed in any order, or in parallel? The only conceivable answers would require so much extra work and overhead that you would almost never use them unless you were sure that they would provide very large advantages. Ideally, we'd like the compiler to make these optimization decisions for us.

In Evlan (v0.3.0), the above code would look like this:

processAll = items => Array.map(items, processItem)

Or, if processItem() is a procedure with side effects:

processAll = items => Thread.doAll(Array.map(items, processItem))

In both cases, the code does not specify any order in which to process the items. In both cases, the optimizer and the virtual machine are free to parallelize the task in any number of ways.

In fact, because Evlan is a purely functional language, opportunities like this appear all over the place. Any time you call two functions independently -- that is, neither call's parameters depend on the other's output -- the calls can be parallelized.

Now, on current CPU's, such massively parallel processing isn't very practical. If you are lucky, you have a system with two CPU's, which can thus perform two tasks simultaneously. Even then, the tasks need to be large and require little communication in order for parallelization to be practical.

However, this is all because current CPU's are designed to run C programs, and C programs are serial. This is going to change in the near future. Intel and AMD are already producing dual-core processors. Many advanced processors use VLIW (Very Long Instruction Word) technology, in which each instruction is expected to contain multiple parallel operations. Sony, Toshiba, and IBM have recently developed a new kind of processor called a "cell", which is designed to run in parallel with many other cells. If more software were written in languages like Evlan which allow for massive parallelization, even more aggressively parallel chips will no doubt be designed to run it.

Vector Processing

On a more specific sub-note, Evlan is a good language for taking advantage of vector processing. Vector processors allow you to load large arrays of numbers and perform single operations across the entire array at once. Evlan strongly encourages the use of arrays, unlike most functional languages which for some reason think linked lists should be used instead. Furthermore, because Evlan encourages you to use functions like Array.map() to apply an operation across an array (like shown above) rather than using a loop, it is much easier for a compiler to map such operations to a vector processor.

Concurrency

At a higher level, Evlan has strong support for highly concurrent programming. Where automatic parallelization is not possible, you can use threads just like in most other languages. However, Evlan's threads are far more efficient than threads in other languages. Among other advantages, Evlan's threads do not keep space reserved for an execution stack when they are not running. The result is that it is quite reasonable to have hundreds or even thousands of threads executing concurrently. Such designs are most useful for things like network servers, where each thread handles one connection, or simulations (e.g. games), where each thread might control one simulated object.

Note that future versions of Evlan may move to a different system for concurrency. In particular, I am studying ways to integrate the ideas from this study, which suggests that humans like to write rules of the form "When condition X is true, do Y". I am not yet sure to what extent this will change the current system, however.

Implementation Status: Evlan 0.3 only uses one CPU and one OS-level thread, so parallelization of raw computation is not possible. However, Evlan programs may use VM-level threads to perform concurrent I/O operations.

References: Wikipedia articles on parallel programming, vector processing and the cell processor. Wikipedia also lists quite a few examples of concurrent programming languages.