Parallel Processor Computing Unit 1
Parallel Processor Computing Unit 1
BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
by
PARTHA ROY
Asso.Prof.
BIT, Durg
http://royalproy.tripod.com [email protected] 1
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
UNIT-1
In the simplest sense, parallel computing is the simultaneous use of multiple computing resources to solve a
computational problem.
To be run using multiple CPUs
A problem is broken into discrete parts that can be solved concurrently
Each part is further broken down to a series of instructions
Instructions from each part execute simultaneously on different CPUs
The computational problem usually demonstrates characteristics such as the ability to be:
Broken apart into discrete pieces of work that can be solved simultaneously;
Execute multiple program instructions at any moment in time;
Solved in less time with multiple compute resources than with a single compute resource.
http://royalproy.tripod.com [email protected] 2
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
Traditionally, parallel computing has been considered to be "the high end of computing" and has been motivated
by numerical simulations of complex systems and "Grand Challenge Problems" such as:
weather and climate , chemical and nuclear reactions , biological, human genome , geological, seismic activity,
electronic circuits , manufacturing processes, etc.
Today, commercial applications are providing an equal or greater driving force in the development of faster
computers. These applications require the processing of large amounts of data in sophisticated ways. Example
applications include:
parallel databases, data mining , oil exploration , web search engines, web based business services , computer-
aided diagnosis in medicine , management of national and multi-national corporations , advanced graphics and
virtual reality, particularly in the entertainment industry , networked video and multi-media technologies , etc.
http://royalproy.tripod.com [email protected] 3
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
For over 40 years, virtually all computers have followed a common machine model known as the von Neumann
computer. A von Neumann computer uses the stored-program concept. The CPU executes a stored program that
specifies a sequence of read and write operations on the memory.
Basic design:
Memory is used to store both program and data instructions
Program instructions are coded data which tell the computer to do something
Data is simply information to be used by the program
A central processing unit (CPU) gets instructions and/or data from memory, decodes the instructions and
then sequentially performs them.
One of the more widely used classifications, in use since 1966, is called Flynn's Taxonomy. Flynn's taxonomy
distinguishes multi-processor computer architectures according to how they can be classified along the two
independent dimensions of Instruction and Data. Each of these dimensions can have only one of two possible
states: Single or Multiple.
SISD SIMD
http://royalproy.tripod.com [email protected] 4
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
http://royalproy.tripod.com [email protected] 5
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
http://royalproy.tripod.com [email protected] 6
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
Currently, the most common type of parallel computer. Most modern computers fall into this category.
Multiple Instruction: every processor may be executing a different instruction stream
Multiple Data: every processor may be working with a different data stream
Execution can be synchronous or asynchronous, deterministic or non-deterministic
Examples: most current supercomputers, networked parallel computer "grids".
AMDAHL'S LAW
Amdahl's law is a model for the relationship between the expected speedup of parallelized implementations of an
algorithm relative to the serial algorithm, under the assumption that the problem size remains the same when
parallelized.
For example, if for a given problem size a parallelized implementation of an algorithm can run 12% of the
algorithm's operations arbitrarily quickly (while the remaining 88% of the operations are not parallelizable),
Amdahl's law states that the maximum speedup of the parallelized version is 1/(1 0.12) = 1.136 times as fast as
the non-parallelized implementation.
More technically, the law is concerned with the speedup achievable from an improvement to a computation that
affects a proportion P of that computation where the improvement has a speedup of S. (For example, if an
improvement can speed up 30% of the computation, P will be 0.3; if the improvement makes the portion affected
twice as fast, S will be 2.) Amdahl's law states that the overall speedup of applying the improvement will be:
http://royalproy.tripod.com [email protected] 7
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
Speedup:
Speedup is defined as the time taken by a program to execute in serial (with one processor) divided by the time
taken to execute in parallel (with many processors). The formula for speedup is:
T1
S = -------------
Tj
Where Tj is the time taken to execute the program using j number of processors. Speedup also indicates the
efficiency of multi processor systems as compared to uni-processor systems.
Amdahl's law is used to find the maximum expected improvement to an overall system when only a part of the
system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using
multiple processors.
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the
sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a
particular portion of 1 hour cannot be parallelized, while the remaining portion of 19 hours (95%) can be
parallelized, then regardless of how many processors we devote to a parallelized the execution of this program,
the minimum execution time cannot be less than that critical 1 hour. Hence the speedup is limited up to 20.
MOORES LAW :
According to Moores Law, the number of transistors on a chip roughly doubles every two years.
CPU
Registers PC ALU Local memory
http://royalproy.tripod.com [email protected] 8
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
CPU
Registers Functional Units
ADD
MULTIPLY
Instruction DIVIDE
Stack
BOOLEAN
http://royalproy.tripod.com [email protected] 9
Tutorial on Parallel Processors and Computing by PARTHA ROY, Asso.Prof. BIT, Durg
UNIT-1 : INTRODUCTION & TECHNIQUES OF PARALLELISM
1. Multiprogramming:
a. Usually every process or program consists of either CPU-bound (computation intensive)
instructions or I/O bound (Input Output intensive) instructions or a combination of both.
b. When a system has many processes then, while CPU is busy with some CPU-bound process then
at the same time a waiting I/O bound process can be allocated I/O resources for its execution.
This is called Multiprogramming.
c. Here processes do not have to wait for each other to complete and hence can execute
simultaneously (parallel).
2. Time sharing
a. In Multiprogramming systems a process which takes a very long time in the CPU or I/O
processing can drastically reduce the system performance, as other processes have to wait in
queue.
b. This problem gets solved in Time-sharing systems where we assign Time-Slices to every process
and a preemptive (pausing) strategy is used to automatically pause a process when its allocated
time span is over.
c. Here every process is assigned a specific time slice to utilize the CPU resources. The preempted
processes go into waiting state and when given a chance by the Scheduler are again assigned the
CPU resources. This happens till all the processes are finished or the system is shutdown.
then,
Total time for non-pipelined system
Tnp = n * k * t
frequency at which tasks that can be pushed into the system without the possibility of any collision is,
f = 1 / tact
http://royalproy.tripod.com [email protected] 10