Parallel Prrocessor

Parallel processing is a technique that enhances computational speed by allowing simultaneous data processing across multiple processors. It can be organized in various architectures such as SISD, SIMD, MISD, and MIMD, each with distinct characteristics and applications. The document also discusses the advantages of symmetric multiprocessors, cache coherence issues, and methods for increasing performance through multithreading and chip multiprocessors.

Uploaded by

Aditya Kumar Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views12 pages

Parallel Prrocessor

Uploaded by

Aditya Kumar Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Parallel Processing

What is Parallel Processing?

– It is the technique which provides simultaneous data processing for the purpose of
increasing the computational speed of a computer system.
– parallel can be achieved by multiple functional unit that perform identical or different
operations simultaneously.
-In computers, parallel processing is the processing of program instructions by dividing
them among multiple processors with the objective of running a program in less time.
-The simultaneous use of more than one CPU to execute a program. Ideally, parallel
processing makes a program run faster because there are more engines (CPUs) running
it. In practice, it is often difficult to divide a program in such a way that separate CPUs
can execute different portions without interfering with each other.

A. Multiple Processor Organization

• Single instruction, single data stream – SISD
• Single instruction, multiple data stream – SIMD
• Multiple instruction, single data stream – MISD
• Multiple instruction, multiple data stream- MIMD
Single Instruction, Single Data Stream – SISD
• Single processor
• Single instruction stream
• Data stored in single memory
• Uni-processor
Single Instruction, Multiple Data Stream – SIMD
• Single machine instruction
• Controls simultaneous execution
• Number of processing elements
• Lockstep basis
• Each processing element has associated data memory
• Each instruction executed on different set of data by different processors
• Vector and array processors
Multiple Instruction, Single Data Stream – MISD
• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction sequence
• Never been implemented
Multiple Instruction, Multiple Data Stream- MIMD
• Set of processors
• Simultaneously execute different instruction sequences
• Different sets of data
• SMPs, clusters and NUMA systems
Taxonomy of Parallel Processor Architecture

MIMD – Overview
• ž General purpose processors
• ž Each can process all instructions necessary
• Further classified by method of processor communication

Tightly Coupled – SMP

• ž Processors share memory
• ž Communicate via that shared memory
• ž Symmetric Multiprocessor (SMP)
• —Share single memory or pool
• —Shared bus to access memory
• —Memory access time to given area of memory is approximately the same for each
processor
Tightly Coupled – NUMA
• ž Non uniform memory access
• ž Access times to different regions of memory may differ
Loosely Coupled – Clusters
• ž Collection of independent uni processors or SMPs
• ž Interconnected to form a cluster
• ž Communication via fixed path or network connections

Parallel Organizations – SISD

Parallel Organizations – SIMD

Parallel Organizations – MIMD Shared Memory

Parallel Organizations – MIMD
Distributed Memory

B. Symmetric Multiprocessors
• ž A stand alone computer with the following characteristics
• —Two or more similar processors of comparable capacity
• —Processors share same memory and I/O
• —Processors are connected by a bus or other internal connection
• —Memory access time is approximately the same for each processor
• —All processors share access to I/O
• ○Either through same channels or different channels giving paths to same devices
• —All processors can perform the same functions (hence symmetric)
• —System controlled by integrated operating system
• ○providing interaction between processors
• ○Interaction at job, task, file and data element levels
Multiprogramming and Multiprocessing

SMP Advantages
• ž Performance
• —If some work can be done in parallel
• ž Availability
• —Since all processors can perform the same functions, failure of a single processor
does not halt the system
• ž Incremental growth
• —User can enhance performance by adding additional processors
• ž Scaling
• —Vendors can offer range of products based on number of processors

Block Diagram of Tightly Coupled Multiprocessor

Organization Classification
• ž Time shared or common bus
• ž Multiport memory
• ž Central control unit
Time Shared Bus
• ž Simplest form
• ž Structure and interface similar to single processor system
• ž Following features provided
• —Addressing – distinguish modules on bus
• —Arbitration – any module can be temporary master
• —Time sharing – if one module has the bus, others must wait and may have to
suspend
• ž Now have multiple processors as well as multiple I/O modules
Symmetric Multiprocessor Organization

Time Share Bus – Advantages

• ž Simplicity
• ž Flexibility
• ž Reliability
Time Share Bus – Disadvantage
• ž Performance limited by bus cycle time
• ž Each processor should have local cache
• —Reduce number of bus accesses
• ž Leads to problems with cache coherence
• —Solved in hardware – see later
Operating System Issues
• ž Simultaneous concurrent processes
• ž Scheduling
• ž Synchronization
• ž Memory management
• ž Reliability and fault tolerance
A Mainframe SMP
IBM zSeries
• ž Uniprocessor with one main memory card to a high-end system with 48
processors and 8 memory cards
• ž Dual-core processor chip
• —Each includes two identical central processors (CPs)
• —CISC superscalar microprocessor
• —Mostly hardwired, some vertical microcode
• —256-kB L1 instruction cache and a 256-kB L1 data cache
• ž L2 cache 32 MB
• —Clusters of five
• —Each cluster supports eight processors and access to entire main memory space
• ž System control element (SCE)
• —Arbitrates system communication
• —Maintains cache coherence
• ž Main store control (MSC)
• —Interconnect L2 caches and main memory
• ž Memory card
• —Each 32 GB, Maximum 8 , total of 256 GB
• —Interconnect to MSC via synchronous memory interfaces (SMIs)
• ž Memory bus adapter (MBA)
• —Interface to I/O channels, go directly to L2 cache
IBM z990 Multiprocessor Structure

C. Cache Coherence and

MESI Protocol
• ž Problem – multiple copies of same data in different caches
• ž Can result in an inconsistent view of memory
• ž Write back policy can lead to inconsistency
• ž Write through can also give problems unless caches monitor memory traffic
Cache Coherence
• ž Node 1 directory keeps note that node 2 has copy of data
• ž If data modified in cache, this is broadcast to other nodes
• ž Local directories monitor and purge local cache if necessary
• ž Local directory monitors changes to local data in remote caches and marks
memory invalid until writeback
• ž Local directory forces writeback if memory location requested by another
processor
Software Solutions
• ž Compiler and operating system deal with problem
• ž Overhead transferred to compile time
• ž Design complexity transferred from hardware to software
• ž However, software tends to make conservative decisions
• —Inefficient cache utilization
• ž Analyze code to determine safe periods for caching shared variables
Hardware Solution
• ž Cache coherence protocols
• ž Dynamic recognition of potential problems
• ž Run time
• ž More efficient use of cache
• ž Transparent to programmer
• ž Directory protocols
• ž Snoopy protocols
Directory Protocols
• ž Collect and maintain information about copies of data in cache
• ž Directory stored in main memory
• ž Requests are checked against directory
• ž Appropriate transfers are performed
• ž Creates central bottleneck
• ž Effective in large scale systems with complex interconnection schemes
Snoopy Protocols
• ž Distribute cache coherence responsibility among cache controllers
• ž Cache recognizes that a line is shared
• ž Updates announced to other caches
• ž Suited to bus based multiprocessor
• ž Increases bus traffic
ž Write Invalidate
• ž Multiple readers, one writer
• ž When a write is required, all other caches of the line are invalidated
• ž Writing processor then has exclusive (cheap) access until line required by
another processor
• ž Used in Pentium II and PowerPC systems
• ž State of every line is marked as modified, exclusive, shared or invalid
• ž MESI
Write Update
• ž Multiple readers and writers
• ž Updated word is distributed to all other processors
• ž Some systems use an adaptive mixture of both solutions
MESI State Transition Diagram

Increasing Performance
• ž Processor performance can be measured by the rate at which it executes
instructions
• ž MIPS rate = f * IPC
• —f processor clock frequency, in MHz
• —IPC is average instructions per cycle
• ž Increase performance by increasing clock frequency and increasing instructions
that complete during cycle
• ž May be reaching limit
• —Complexity
• —Power consumption
Multithreading and Chip Multiprocessors
• ž Instruction stream divided into smaller streams (threads)
• ž Executed in parallel
• ž Wide variety of multithreading designs
Definitions of Threads and Processes
• ž Thread in multithreaded processors may or may not be same as software threads
• ž Process:
• —An instance of program running on computer
• —Resource ownership
• ○Virtual address space to hold process image
• —Scheduling/execution
• —Process switch
• ž Thread: dispatchable unit of work within process
• —Includes processor context (which includes the program counter and stack
pointer) and data area for stack
• —Thread executes sequentially
• —Interruptible: processor can turn to another thread
• ž Thread switch
• —Switching processor between threads within same process
• —Typically less costly than process switch
Implicit and Explicit Multithreading
• ž All commercial processors and most experimental ones use explicit
multithreading
• —Concurrently execute instructions from different explicit threads
• —Interleave instructions from different threads on shared pipelines or parallel
execution on parallel pipelines
• ž Implicit multithreading is concurrent execution of multiple threads extracted
from single sequential program
• —Implicit threads defined statically by compiler or dynamically by hardware
Approaches to Explicit Multithreading
• ž Interleaved
• —Fine-grained
• —Processor deals with two or more thread contexts at a time
• —Switching thread at each clock cycle
• —If thread is blocked it is skipped
• ž Blocked
• —Coarse-grained
• —Thread executed until event causes delay
• —E.g.Cache miss
• —Effective on in-order processor
• —Avoids pipeline stall
• ž Simultaneous (SMT)
• —Instructions simultaneously issued from multiple threads to execution units of
superscalar processor
• ž Chip multiprocessing
• —Processor is replicated on a single chip
• —Each processor handles separate threads

William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Parallel Processing
No ratings yet
Parallel Processing
28 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Multi-Processor-Parallel Processing PDF
No ratings yet
Multi-Processor-Parallel Processing PDF
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Lecture-7 SMP NUMA Cache Coherence
No ratings yet
Lecture-7 SMP NUMA Cache Coherence
34 pages
Unit VI
No ratings yet
Unit VI
50 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Parallel Processor Insights
No ratings yet
Parallel Processor Insights
32 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
51 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
PART17
No ratings yet
PART17
45 pages
Parallel Processing in Computer Architecture
No ratings yet
Parallel Processing in Computer Architecture
32 pages
Parallelism and Multicores
No ratings yet
Parallelism and Multicores
54 pages
CH20 COA11e
No ratings yet
CH20 COA11e
40 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
Unit 3
No ratings yet
Unit 3
28 pages
Parallel Processing:: Multiple Processor Organization
No ratings yet
Parallel Processing:: Multiple Processor Organization
24 pages
21cs401 CA Unit V
No ratings yet
21cs401 CA Unit V
16 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Overview of Parallel Hardware Concepts
No ratings yet
Overview of Parallel Hardware Concepts
60 pages
Final Unit5 CO Notes
No ratings yet
Final Unit5 CO Notes
7 pages
Unit6 - Microprocessor - Final 1
No ratings yet
Unit6 - Microprocessor - Final 1
30 pages
CH17-COA10e - Parallel Processing
No ratings yet
CH17-COA10e - Parallel Processing
45 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
A502018463 23825 5 2019 Unit6
No ratings yet
A502018463 23825 5 2019 Unit6
36 pages
CH17 ParallelProcessing 32 Slides
No ratings yet
CH17 ParallelProcessing 32 Slides
32 pages
Multiprocessor Systems Explained
No ratings yet
Multiprocessor Systems Explained
17 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
07 Multiprocessors MF PDF
No ratings yet
07 Multiprocessors MF PDF
99 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
Part B Ma
No ratings yet
Part B Ma
16 pages
Unit 6
No ratings yet
Unit 6
36 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Unit 6 Mom
No ratings yet
Unit 6 Mom
23 pages
Copm Org Lecture 8
No ratings yet
Copm Org Lecture 8
28 pages
Seminar
No ratings yet
Seminar
85 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
16 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
CH17 COA10e
No ratings yet
CH17 COA10e
45 pages
17 Computer Architecture and Organization
No ratings yet
17 Computer Architecture and Organization
28 pages
Multiprocessor Memory Architectures
No ratings yet
Multiprocessor Memory Architectures
10 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
Chapter - 5 Introduction To Advanced Architecture 5.1 Introduction To Parallel Processing
No ratings yet
Chapter - 5 Introduction To Advanced Architecture 5.1 Introduction To Parallel Processing
11 pages
Multiprocessor Systems Guide
No ratings yet
Multiprocessor Systems Guide
12 pages
Distributed Shared Memory Basics
No ratings yet
Distributed Shared Memory Basics
36 pages
Week 5
No ratings yet
Week 5
35 pages
I.K.Gujral Punjab Technical University, Jalandhar: Jalandhar-Kapurthala Highway, Jalandhar
No ratings yet
I.K.Gujral Punjab Technical University, Jalandhar: Jalandhar-Kapurthala Highway, Jalandhar
7 pages
Amara Raja Groupvbuu Jiyg Uhggghuuuhh Uuuhhh
No ratings yet
Amara Raja Groupvbuu Jiyg Uhggghuuuhh Uuuhhh
9 pages
Aditya Kumar Sharma
No ratings yet
Aditya Kumar Sharma
3 pages
I.K.Gujral Punjab Technical University, Jalandhar: Jalandhar-Kapurthala Highway, Jalandhar
No ratings yet
I.K.Gujral Punjab Technical University, Jalandhar: Jalandhar-Kapurthala Highway, Jalandhar
8 pages
Project Report 1
No ratings yet
Project Report 1
54 pages
Piyush Kumar ERP FEE
No ratings yet
Piyush Kumar ERP FEE
1 page
Digital System Design Lesson Plan ECE
No ratings yet
Digital System Design Lesson Plan ECE
3 pages
Excel: Use INDEX for Multiple Lookups
No ratings yet
Excel: Use INDEX for Multiple Lookups
34 pages
Dbms Codds Rules
No ratings yet
Dbms Codds Rules
2 pages
London Ambulance System - Academic Essay Assignment - WWW - Topgradepapers
No ratings yet
London Ambulance System - Academic Essay Assignment - WWW - Topgradepapers
9 pages
HOBOware User's Guide - 12730
No ratings yet
HOBOware User's Guide - 12730
251 pages
E016pdf1 - 395 Reconocimieento de Voz Sin ADC
No ratings yet
E016pdf1 - 395 Reconocimieento de Voz Sin ADC
6 pages
SEHA Service Procurement User Guide
No ratings yet
SEHA Service Procurement User Guide
11 pages
E2 Lab 7 5 3 Instructor
No ratings yet
E2 Lab 7 5 3 Instructor
15 pages
48 Hours: B.Sc. I Year I Semester (Computer Science Paper-I) Elements of Information Technology
No ratings yet
48 Hours: B.Sc. I Year I Semester (Computer Science Paper-I) Elements of Information Technology
6 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
As 1141.14-2007 Methods For Sampling and Testing Aggregates Particle Shape by Proportional Caliper
No ratings yet
As 1141.14-2007 Methods For Sampling and Testing Aggregates Particle Shape by Proportional Caliper
2 pages
Toyota Error Code List
No ratings yet
Toyota Error Code List
9 pages
M.tech Project Documentation Guidelines& Certificates
No ratings yet
M.tech Project Documentation Guidelines& Certificates
8 pages
Sri Guru Dattatreya Charitra in Telugu PDF
67% (3)
Sri Guru Dattatreya Charitra in Telugu PDF
3 pages
Bookshop Management Project
No ratings yet
Bookshop Management Project
25 pages
Section 1: Multiple Choice Questions: What Is The Output of This Code?
No ratings yet
Section 1: Multiple Choice Questions: What Is The Output of This Code?
6 pages
Dahua Ipc-Hfw5300cp-L en Datasheet PDF
No ratings yet
Dahua Ipc-Hfw5300cp-L en Datasheet PDF
4 pages
Operate A Word Processing Application Basic
67% (3)
Operate A Word Processing Application Basic
46 pages
Onward Journey Ticket Details: E-Ticket/Reservation Voucher
No ratings yet
Onward Journey Ticket Details: E-Ticket/Reservation Voucher
1 page
LG PLC Safety and Installation Guide
No ratings yet
LG PLC Safety and Installation Guide
78 pages
Tri-Tue-Nhan-Tao Nguyen-Hai-Minh 04 Uninformed Search - (Cuuduongthancong - Com)
No ratings yet
Tri-Tue-Nhan-Tao Nguyen-Hai-Minh 04 Uninformed Search - (Cuuduongthancong - Com)
70 pages
Digital Communication Questions
No ratings yet
Digital Communication Questions
14 pages
Tài Liệu Final CSI SP23 - 1
No ratings yet
Tài Liệu Final CSI SP23 - 1
6 pages
Ibm 02-07
No ratings yet
Ibm 02-07
12 pages
U4.1 JSP SERV
No ratings yet
U4.1 JSP SERV
7 pages
Math 110: Linear Algebra Homework #4: David Zywina
No ratings yet
Math 110: Linear Algebra Homework #4: David Zywina
9 pages
Chief Officer Salary Overview
No ratings yet
Chief Officer Salary Overview
1,774 pages
Calculus Problems: Integrals and Applications
No ratings yet
Calculus Problems: Integrals and Applications
3 pages
Information Sheet: Availability For Employment: Position Applied For
No ratings yet
Information Sheet: Availability For Employment: Position Applied For
8 pages
Equallogic Multipathing Extension Module: Installation and User Guide
No ratings yet
Equallogic Multipathing Extension Module: Installation and User Guide
24 pages

Parallel Prrocessor

Uploaded by

Parallel Prrocessor

Uploaded by

Parallel Processing

What is Parallel Processing?

A. Multiple Processor Organization

Tightly Coupled – SMP

Parallel Organizations – SISD

Parallel Organizations – MIMD Shared Memory

Block Diagram of Tightly Coupled Multiprocessor

Time Share Bus – Advantages

C. Cache Coherence and

You might also like