Thursday, October 18, 2007

Parallelism through multiple execution contexts

The latest version of IPython comes with built in parallel computing capabilities. In IPython's model of parallel computing, one of the core concepts is that of an execution context or scope. In Python, an execution context is a dictionary-like object in which code can be executed. This is demonstrated in the following example that uses Python's exec statement:

In [5]: context = dict(a=10)

In [6]: exec 'b = 2*a' in context

In [7]: context['b']
Out[7]: 20

There are three basic things you can do with such an execution context:
  1. You can execute code in the context.
  2. You can set new objects in the context by key.
  3. You can get objects out of the dictionary by key.
By creating multiple execution contexts, you can begin to write code that has the first signs of parallelism:
In [10]: contexts = [{},{},{},{}]

In [11]: for i, c in enumerate(contexts):
....: c['i'] = i
....: exec 'j = 2*i' in c, c
....:
....:

In [12]: for c in contexts:
....: print "j = ", c['j']
....:
....:
j = 0
j = 2
j = 4
j = 6

Of course, in this simple example, nothing useful is computed and there is no real parallelism as the code is executed in each context serially. But, you can easily imagine what would be required for true parallelism: multiple execution contexts capable of running code in parallel. This way of expressing parallelism has a number of features.

First, it provides a conceptual foundation for parallelism that is independent of how the execution contexts are implemented. That is, contexts could run code in different threads or different processors, or using any other appropriate construct. Second, it opens the door for fault tolerance. Because a context is simply a dictionary, it can be replicated or serialized and moved to other hosts if needed. Third, this model can be adapted to many different models of parallelism, including task parallel, data parallel and message passing.

These are some of the ideas that are driving the design of IPython's parallel computing architecture. There are also a host of questions that this model brings up:

  • By default contexts don't share state. How can shared state be included in this model?
  • Is there a simple way to introduce message passing into this model?