Parallel Processors


Computer Engineering


A single processor carries out program processes sequentially. With parallel processors, a program can be parallelized such that various independent processes in the program can be computed simultaneously, rather than sequentially. Large arrays of parallel processors, called supercomputers, can generally mitigate against the effects of excess heat generation and electrical noise that can interfere with successful computation. Amdahl's Law describes the limit to which program execution can be sped up by parallelization.



Splitting computational processes across multiple processors not only by-passes the register size bottleneck, it allows a much greater quantity of data to be processed. Just as the computational capacity of a single CPU is determined by the number of transistor structures that exist on that chip, the computational capacity of a parallel processor system is determined by the number of individual CPUs that make up a particular system. The overall capacity of the resulting machine is such that it is often referred to as a supercomputer. Two major problems that have become increasingly important as transistor size has become smaller and CPUs more compact in size are heat and noise. The reason for the difficulties these two factors pose is the close proximity of transistor structures to each other. Heat generated by the flow of electrons through the material of the transistors must be dissipated effectively in order to prevent transistor failure due to overheating. Similarly, the closer in proximity transistor structures are to each other, the more readily do signals from one ‘leak’ into another, resulting in scrambled data and inaccurate results. Electrical leakage also has the potential to short out and destroy a transistor structure, effectively destroying the entire value of the CPU. Splitting processes across multiple processors mitigates the heat generated by distributing it across the array so that it is rather less localized. In addition, the physical size of the array enables the incorporation of large-scale cooling systems that would not be feasible for smaller or individual CPU systems. For example, supercomputers such as IBM's ‘Blue’ series consist of CPU arrays housed in cabinetry large enough to occupy several hundred square meters of floor space. The cabinetry for such systems incorporates a refrigeration cooling system that maintains the entire array at its optimum operating temperature. Electrical noise can also be mitigated in supercomputer arrays of parallel processors by enabling use of CPUs having transistor structures etched on a larger scale, unlike the close-packed arrays of transistors on the CPU chips of stand-alone systems.


The structure of a parallel processor array requires a well-designed and implemented bridge system for processor coupling. Additionally, each implementation depends on a proper algorithm for distributing the computation task in the desired manner across the CPU array. This must take in the channel capacity of the system in order to avoid overloading and bottlenecks in data flow. In most applications, the desired function of the communication architecture is to produce process symmetry such that the computational load is evenly distributed among the CPUs in the parallel array. Two methodologies describe process symmetry. One is concurrence, and the other is parallelism. Concurrent processes are different processes being carried out on different processors at the same time. This is the condition achieved when processes ‘time-share’ on one or more CPUs. In effect, the CPU carries out a number of operations for one process, then switches over to carry out a number of operations for the other process. One might liken concurrence to working out two or more different crossword puzzles at the same time, filling in a few answers of one, then a few of another, until all are finished at roughly the same time. True parallelism, in comparison, can actually be carried out at bit level, but it is most often the case that the operational algorithm divides the overall program into a number of different subtasks and program functions that can be processed independently. Upon completion, the individual results are combined to yield the solution or result of the computation. A third methodology called ‘distributed computing’ can utilize both concurrency and parallelism, but typically requires that various CPUs communicate with each other for the completion of their respective tasks when the computation of one task relies on the result or a parameter being passed from another task on another CPU.



Latency is due to the fact that a program cannot be made completely parallel in execution; there is always some part of the program code that either does not translate to a parallel counterpart, or that must be run sequentially. Amdahl's Law considers S. the ‘speed up’ of the entire executable program, relative to s, the speed up of just the part of the program that can be made parallel, in terms of p, the percentage of the execution time of the program due to the part that can be parallelized before the parallelization occurs. This fraction will always be detrimental to the extent to which the execution of program code can be sped up, regardless of the number of additional processors over which the program is spread. For example, a program in which 10% of the code cannot be parallelized has a corresponding value of p = 0.9, and cannot run any more than ten times faster. The addition of more processors than are required to achieve that result will have no further effect on the speed at which the program will run.

—Richard M. Renneboog M.Sc.

Fountain, T J. Parallel Computing: Principles and Practice. Cambridge: Cambridge University Press, 1994. Print.

Hughes, Cameron, and Tracey Hughes. Parallel and Distributed Programming Using C++. Boston: Addison-Wesley, 2004. Print.

Jadhav, S.S. (2008) Advanced Computer Architecture & Computing. Pune, IND: Technical Publishers, 2008. Print.

Kirk, David B, and Wen-mei Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Burlington, Massachusetts: Morgan Kaufmann Elsevier, 2013. Print.

Lafferty, Edward L, Marion C. Michaud, and Myra J. Prelle. Parallel Computing: An Introduction. Park Ridge: Noyes Data Corporation, 1993. Print.

Openshaw, Stan, and Ian Turton. High Performance Computing and the Art of Parallel Programming: An Introduction for Geographers, Social Scientists, and Engineers. London: Routledge, 2005. Print.

Roosta, Seyed H. Parallel Processing and Parallel Algorithms: Theory and Computation. New York: Springer, 2013. Print.

Tucker, Allen B, Teofilo F. Gonzalez, and Jorge L. Diaz-Herrera. Computing Handbook. Boca Raton, FL: CRC Press, 2014. Print.