There are two general models for parallelization:
 
and
 
 
 
We will use the MIMD approach. Suppose we begin with a single thread algorithm in the following form:
 
while ( there_is_more_to_be_done () )
 
start scheduler process
 
get first parcel to work on
 
if ( done with parcel )
 
synchronize with other tasks
 
 
The resultant components (or terms, in the case of scalar invariants) can be sent to all processors so that 
intermediate results are known to all, or results can be stored on each processor and collected at the end of 
processing. As with the Mathematica examples discussed previously,
(anti)symmetric pairs of indices can be used to reduce the workload.
 
Synchronization in PTAH is by barrier:
all processors wait for all results in a given computation and
perform simplifications themselves.
 
Another important consideration in parallel processing is fault tolerance:
processors may fail, so work must be reassignable.
If results are saved after each parcel, scheduling or catastrophic failure can be recovered and continued; 
results are then collected at the end, but final simplification must be delayed until then.
 
The next appendix is about algebraic computation without Mathematica.
 
 
 
 
 
 
©2005, Kenneth R. Koehler.  All Rights Reserved.  This document may be freely reproduced provided that this copyright notice is included. 
Please send comments or suggestions to the author.
When dealing with parallel computation, one of the many issues of concern is scaling.
Ideally, n processors would produce a factor of n speedup, but in practice, we usually achieve factors of
between log 2 n and n / ln n due to internal waits for synchronization.
SIMD is characterized by topology: how data flows through the processors.
Typically, it is a hardware implementation in array processors, where the synchronization is 
controlled by hardware. It is useful for lattice computations, linear algebra, FFT, convolutions, image
processing, pattern recognition, etc.
MIMD lends itself to more general hardware approaches, such as symmetric multiprocessors (SMP) and
clusters (with network connections).
Work scheduling, synchronization and deadlock detection and prevention must be programmed
for each application. MIMD is useful for iterative systems, systems of PDEs, finding the zeroes of functions,
cryptography, etc.
make list of "atomic" operations
where "there_is_more_to_be_done ()" points to the next atomic operation. We propose to
replace this function with "parallelize ()":
do one atomic operation
if ( starting computation )
Note that the atomic operations are identical whether the single thread or parallel algorithm is used.
This approach lends itself to tensor contractions because they are a sum of products, where the components
entering into the products are known in advance. We have implemented this approach in a program called
PTAH: the Parallel Tensor Algebra Hybrid system. 
It is a hybrid of C++ programs which do the actual computations,
 and Mathematica functions which are used to prepare the input and analyze the results.
break up computation into parcels
else
point to next calculation in parcel
if ( done with computation )
send results
else
get next parcel from scheduler
PTAH implementation
In PTAH, contractions are done in parallel. Work scheduling is done 
by partitioning contractions: we create
lists of free indices and sum components for all nonzero product terms. These are generated on all processors:
ie., for Ra b c d = ga e Reb c d
These index lists must be in the same order on all processors. For two processors, the first would do the 
first 11 products, and the second would do the last 11. In general, we split the list into m * n parcels 
for n processors, where m is large enough to smooth out the differences in individual product complexity 
(m  is typically of the order of 100).
free indices summed indices ga e Reb c d a b c d e term 0 1 0 1 0 g0 0 R01 0 1 0 1 0 1 3 g0 3 R31 0 1 0 1 0 2 0 g0 0 R01 0 2 0 1 0 2 3 g0 3 R31 0 2 0 1 1 3 0 g0 0 R01 1 3 0 1 1 3 3 g0 3 R31 1 3 0 1 2 3 0 g0 0 R01 2 3 0 1 2 3 3 g0 3 R31 2 3 0 2 0 2 0 g0 0 R02 0 2 0 2 0 2 3 g0 3 R32 0 2 0 2 1 3 0 g0 0 R02 1 3 0 2 1 3 3 g0 3 R32 1 3 0 2 2 3 0 g0 0 R02 2 3 0 2 2 3 3 g0 3 R32 2 3 0 3 0 3 0 g0 0 R03 0 3 0 3 0 3 3 g0 3 R33 0 3 0 3 1 2 0 g0 0 R03 1 2 0 3 1 2 3 g0 3 R33 1 2 1 2 1 2 1 g1 1 R12 1 2 1 3 1 3 1 g1 1 R13 1 3 1 3 2 3 1 g1 1 R13 2 3 2 3 2 3 2 g2 2 R23 2 3