SIMD Presentation

SIMD (single instruction, multiple data) refers to computers that can perform the same operation on multiple data points simultaneously. Vector processors are CPUs that operate on arrays of data called vectors using SIMD instructions. MMX was an early SIMD extension for x86 processors that packed multiple small data types like bytes and words together, enabling the same arithmetic instruction to operate on multiple data elements in parallel. It used the FPU registers to maintain compatibility but this limited its usage. Later SIMD extensions improved on MMX.

Uploaded by

Huzaifa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

317 views28 pages

SIMD Presentation

Uploaded by

Huzaifa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

SIMD

Single instruction, multiple

data (SIMD)
Contents
 Parallel Processors
 Flynn's taxonomy
 What is SIMD?
 Types of Processing
 Scalar Processing
 Vector Processing
 Architecture for Vector Processing
 Vector processors
 Vector Processor Architectures
 Components of Vector Processors
 Advantages of Vector Processing
 Array processors
 Array Processor Classification
 Array Processor Architecture
 Dedicated Memory Organization
 Global Memory Organization
 ILLIAC IV
 ILLIAC IV Architecture
 Super Computers
 Cray X1
 Multimedia Extension
Parallel Processors
 In computers, parallel processing is the processing
 of program instructions by dividing them among
 multiple processors with the objective of running a program in
less time.

 In the earliest computers, only one program ran at a time. A

computation-intensive program that took one hour to run and a
tape copying program that took one hour to run would take a
total of two hours to run. An early form of parallel processing
allowed the interleaved execution of both programs together.

 The computer would start an I/O operation, and while it was

waiting for the operation to complete, it would execute the
processor- intensive program. The total execution time for the
two jobs would be a little over one hour.
Flynn's taxonomy
 Flynn's taxonomy is a classification of
computer architectures, proposed by
Michael J. Flynn in 1966.The
classification system has stuck, and
has been used as a tool in design of
modern processors and their
functionalities.
Classification
 The four classifications defined by Flynn are based
upon the number of concurrent instruction (or control)
streams and data streams available in the architecture.
 Single instruction stream single data stream (SISD)
 Single instruction stream, multiple data streams
(SIMD)
 Single instruction, multiple threads (SIMT)
 Multiple instruction streams, single data stream
(MISD).
Evolution of Intel Vector Instructions
■ MMX (1996, Pentium)
 CPU-based MPEG decoding
 Integers only, 64-bit divided into 2 x 32 to 8 x 8
 Phased out with SSE4
■ SSE (1999, Pentium III)
 CPU-based 3D graphics
 4-way float operations, single precision
 8 new 128 bit Register, 100+ instructions
■ SSE2 (2001, Pentium 4)
 High-performance computing
 Adds 2-way float ops, double-precision; same registers as 4-way single-precision
 Integer SSE instructions make MMX obsolete
■ SSE3 (2004, Pentium 4E Prescott)
 Scientific computing
 New 2-way and 4-way vector instructions for complex
arithmetic
■ SSSE3 (2006, Core Duo)
 Minor advancement over SSE3
■ SSE4 (2007, Core2 Duo Penryn)
 Modern codecs, cryptography
 New integer instructions
 Better support for unaligned data, super shuffle engine
What is SIMD?
 Single instruction, multiple data (SIMD), is a class
 of parallel computers in Flynn's taxonomy.

 It describes computers with multiple processing

elements that perform the same operation on multiple
data points simultaneously. Thus, such machines
exploit data level parallelism.

 There are simultaneous (parallel) computations, but

only a single process (instruction) at a given moment.
How SIMD processes?
Processing/Working
Types of Processing
 Scalar Processing
 A CPU that performs computations on one number or
set of data at a time. A scalar processor is known as a
"single instruction stream single data stream" (SISD)
CPU.
 Vector Processing
 A vector processor or array processor is a
 central processing unit (CPU) that implements an
instruction set containing instructions that operate on
1-D arrays of data called vectors.
Architecture for Vector Processing
 Two architectures suitable for vector processing are:

 Pipelined vector processors

 Parallel Array processors
Pipelined vector processors
 CPU that implements an instruction set that
operates on 1-D arrays, called vectors
 Vectors contain multiple data elements
 Number of data elements per vector is typically
referred to as the vector length
 Both instructions and data are pipelined to reduce
decoding time
Advantages of Vector Processing
Advantages:
 Quick fetch and decode of a single instruction for multiple
operations.
 The instruction provides a regular source of data, which
arrive at
 each cycle, and can be processed in a pipelined fashion
efficiently.
 Easier Addressing of Main Memory
 Elimination of Memory Wastage
 Simplification of Control Hazards
 Reduced Code Size
Array Processors
 ARRAY processor is a processor that performs
computations on a large array of data.
 Array processor is a synchronous parallel
computer with multiple ALU called processing
elements ( PE) that can operate in parallel in
lockstep fashion.
 It is composed of N identical PE under the control
of a single control unit and a number of memory
modules
Array Processor Classification
SIMD ( Single Instruction Multiple Data )
is an array processor that has a single instruction
multiple data organization.
It manipulates vector instructions by means of multiple
functional unit responding to a common instruction.
Attached array processor
is an auxiliary processor attached to a general purpose
computer.
Its intent is to improve the performance of the host
computer in specific numeric calculation tasks.
SIMD-Array Processor Architecture
 SIMD has two basic configuration
 Array processors using RAM also known as
( Dedicated memory organization ).
 • ILLIAC-IV, CM-2,MP-1
 Associative processor using content accessible
memory also known as
 ( Global Memory Organization)
 • BSP
MMX
Multi Media Extensions
Development
 MMX (Multimedia Extension) was introduced in
1996 (Pentium with MMX and Pentium II).
 SSE (Streaming SIMD Extension) was introduced
with Pentium III.
 SSE2 was introduced with Pentium 4.
 SSE3 was introduced with Pentium 4 supporting
hyper-threading technology. SSE3 adds 13 more
instructions.
MMX
 After analyzing a lot of existing applications such as
graphics, MPEG, music, speech recognition, game,
image processing, they found that many multimedia
algorithms execute the same instructions on many
pieces of data in a large data set.
 Typical elements are small, 8 bits for pixels, 16 bits
for audio, 32 bits for graphics and general computing.
 New data type: 64-bit packed data type. Why 64 bits?
 Good enough
 Practical
Data Types of MMX
The four MMX technology data types are:
 Packed byte -- Eight bytes packed into one 64-bit
quantity.
 Packed word -- Four 16-bit words packed into one
64-bit quantity.
 Packed doubleword -- Two 32-bit double words
packed into one 64-bit quantity.
 Quadword -- One 64-bit quantity.
Compatibility
 To be fully compatible with existing IA, no new mode
or state was created. Hence, for context switching, no
extra state needs to be saved.
 To reach the goal, MMX is hidden behind FPU. When
floating-point state is saved or restored, MMX is
saved or restored.
 It allows existing OS to perform context switching on
the processes executing MMX instruction without be
aware of MMX.
 However, it means MMX and FPU can not be used at
the same time. Big overhead to switch.
 Although Intel defenses their decision on aliasing MMX to
FPU for compatibility. It is actually a bad decision. OS can
just provide a service pack or get updated.
 It is why Intel introduced SSE later without any aliasing.
Saturation Arithmetic
 In an 8-bit grayscale picture, 255 is the value for pure white, and 0 is the
value for pure black. In a regular register (AX, BX, CX ...) if we add one
to white, we get black! This is because the regular registers "roll-over" to
the next value. MMX registers get around this by a technique called
"Saturation Arithmetic". In saturation arithmetic, the value of the register
never rolls over to 0 again. This means that in the MMX world, we have
the following equations:
 255 + 100 = 255
 200 + 100 = 255
 0 - 100 = 0;
 99 - 100 = 0
 This may seem counter-intuitive at first to people who are used to their
registers rolling over, but it makes sense in some situations: if we try to
make white brighter, it shouldn't become black.
MMX Registers
 MMX defines eight registers, called MM0
through MM7, and operations that operate
on them. Each register is 64 bits wide and
can be used to hold either 64-bit integers, or
multiple smaller integers in a "packed"
format: a single instruction can then be
applied to two 32-bit integers, four 16-bit
integers, or eight 8-bit integers at once.
Instructions
 The MMX registers are 64 bits wide, but can be broken down as
follows:
 2 32 bit values 4 16 bit values 8 8 bit values The MMX registers cannot
easily be used for 64 bit arithmetic. Let's say that we have 4 bytes
loaded in an MMX register: 10, 25, 128, 255. We have them arranged
as such:
 MM0: | 10 | 25 | 128 | 255 |
 And we do the following pseudo code operation:
 MM0 + 10
 We would get the following result:
 MM0: | 10+10 | 25+10 | 128+10 | 255+10 | = | 20 | 35 | 138 | 255 |
Remember that our arithmetic "saturates" in the last box, so the value
doesn't go over 255.
 Using MMX, we are essentially performing 4 additions in the time it
takes to perform 1 addition using the regular registers, using 4 times
fewer instructions.
MMX Instructions

Computer Architecture A Quantitative Approach Sixth Edition Hennessy Download
100% (1)
Computer Architecture A Quantitative Approach Sixth Edition Hennessy Download
56 pages
Advanced Computer Architecture: CSE-401 E
No ratings yet
Advanced Computer Architecture: CSE-401 E
71 pages
Computer Organization and Design MIPS Edition The Hardware Software Interface Sixth Edition 6th Ed David A. Patterson
0% (1)
Computer Organization and Design MIPS Edition The Hardware Software Interface Sixth Edition 6th Ed David A. Patterson
69 pages
Threads
No ratings yet
Threads
47 pages
Aca Notes
No ratings yet
Aca Notes
148 pages
Computer Architecture
100% (1)
Computer Architecture
125 pages
Computer Architecture
No ratings yet
Computer Architecture
23 pages
Xxhash
No ratings yet
Xxhash
98 pages
2.1 Advanced Processor Technology
No ratings yet
2.1 Advanced Processor Technology
40 pages
Basic Operational Concepts
No ratings yet
Basic Operational Concepts
29 pages
OS - Unit5 - Memorymanagement - Notes
100% (1)
OS - Unit5 - Memorymanagement - Notes
47 pages
Parallel Computer Models: CSE7002: Advanced Computer Architecture
No ratings yet
Parallel Computer Models: CSE7002: Advanced Computer Architecture
37 pages
Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
100% (1)
Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
57 pages
Chapter 5 Memory and Memory Interface
No ratings yet
Chapter 5 Memory and Memory Interface
56 pages
Lec17 x86SIMD PDF
No ratings yet
Lec17 x86SIMD PDF
80 pages
26-27 SIMD Architecture
No ratings yet
26-27 SIMD Architecture
33 pages
UNIT-1 Introduction To Operating System: Computer Science and Engineering
No ratings yet
UNIT-1 Introduction To Operating System: Computer Science and Engineering
13 pages
Co Unit 1 Notes
100% (1)
Co Unit 1 Notes
51 pages
Operating System: Chapter - Process Synchronisation
No ratings yet
Operating System: Chapter - Process Synchronisation
86 pages
Computer Architecture - Memory System
100% (1)
Computer Architecture - Memory System
22 pages
Paging in Operating System
No ratings yet
Paging in Operating System
5 pages
CAO - Question Bank
No ratings yet
CAO - Question Bank
30 pages
SMT and CMP Architectures
100% (1)
SMT and CMP Architectures
19 pages
Layer by Layer - Assembly PDF
No ratings yet
Layer by Layer - Assembly PDF
25 pages
CS-3006!3!1 SIMD Intrinsic Programming Reduced
No ratings yet
CS-3006!3!1 SIMD Intrinsic Programming Reduced
55 pages
2 Mark Question With Answers
No ratings yet
2 Mark Question With Answers
9 pages
Chapter 1 Fundamentals of Computer Design
No ratings yet
Chapter 1 Fundamentals of Computer Design
40 pages
SIMD Computer Organizations
0% (1)
SIMD Computer Organizations
20 pages
Unit 3
No ratings yet
Unit 3
64 pages
Intel SIMD Architecture: Computer Organization and Assembly Languages Yung-Yu Chuang
No ratings yet
Intel SIMD Architecture: Computer Organization and Assembly Languages Yung-Yu Chuang
80 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Lec08 - Instruction Sets - Characteristics and Functions
0% (1)
Lec08 - Instruction Sets - Characteristics and Functions
44 pages
Linear Programming Assignement
No ratings yet
Linear Programming Assignement
5 pages
Linear Programming Assignement
No ratings yet
Linear Programming Assignement
5 pages
NESTLE Assignment
No ratings yet
NESTLE Assignment
9 pages
Foundation of Sequential Programming CSC 210 Lecturer in Charge: Bola Orogun (Mtech, MITPA)
No ratings yet
Foundation of Sequential Programming CSC 210 Lecturer in Charge: Bola Orogun (Mtech, MITPA)
20 pages
GDC09 Abrash Larrabee+Final
No ratings yet
GDC09 Abrash Larrabee+Final
116 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
معالجة مايكروية محاضرة 1 .MH
100% (1)
معالجة مايكروية محاضرة 1 .MH
26 pages
Divisional Public School & College Sahiwal Summer Vacation Homework Session: 2020-21 Subject: English (Part 1) Class: 1
No ratings yet
Divisional Public School & College Sahiwal Summer Vacation Homework Session: 2020-21 Subject: English (Part 1) Class: 1
5 pages
Unit - Ii Process Management, Synchronization and Threads
No ratings yet
Unit - Ii Process Management, Synchronization and Threads
48 pages
S01M05 - SG - OS87010-V-2100 - Traffica - Network - Element - Server - TNES - Kiyrs5ho (ID 10256)
No ratings yet
S01M05 - SG - OS87010-V-2100 - Traffica - Network - Element - Server - TNES - Kiyrs5ho (ID 10256)
60 pages
System Software 2 Marks and 16 Marks With Answer
No ratings yet
System Software 2 Marks and 16 Marks With Answer
23 pages
Assignment Marketing
No ratings yet
Assignment Marketing
14 pages
Application Note 18: The ARM6 Family Bus Interface
No ratings yet
Application Note 18: The ARM6 Family Bus Interface
22 pages
SIMD Architecture
100% (1)
SIMD Architecture
16 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Application of Geometry of Matrix
No ratings yet
Application of Geometry of Matrix
3 pages
Flynn's Taxonomy
No ratings yet
Flynn's Taxonomy
18 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
CSPC2005
No ratings yet
CSPC2005
2 pages
Unit 5 (Coa) Notes
No ratings yet
Unit 5 (Coa) Notes
35 pages
Oop Updated Presentation
No ratings yet
Oop Updated Presentation
12 pages
Efficient Parallel Sort On AVX-512-based Multi-Core and Many-Core Architectures
No ratings yet
Efficient Parallel Sort On AVX-512-based Multi-Core and Many-Core Architectures
9 pages
CO Module 2 PPT
No ratings yet
CO Module 2 PPT
83 pages
Geometry of Matrices: Linear Transformations
No ratings yet
Geometry of Matrices: Linear Transformations
9 pages
UNIT1
No ratings yet
UNIT1
57 pages
Ec 6009 - Advanced Computer Architecture 2 Marks
No ratings yet
Ec 6009 - Advanced Computer Architecture 2 Marks
8 pages
Course Outline Microprocessor & Assembly Language Programming
50% (2)
Course Outline Microprocessor & Assembly Language Programming
3 pages
HPC Cluster Tuning Guide On 3rd Generation Intel Xeon Scalable Processors 1
No ratings yet
HPC Cluster Tuning Guide On 3rd Generation Intel Xeon Scalable Processors 1
10 pages
Accelerating B Spline Interpolation On GPUs AP 2020 Computer Methods and PR
No ratings yet
Accelerating B Spline Interpolation On GPUs AP 2020 Computer Methods and PR
12 pages
CH17-COA10e - Parallel Processing
No ratings yet
CH17-COA10e - Parallel Processing
45 pages
Unit 5
No ratings yet
Unit 5
29 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
Pricing Strategy at Starbucks
No ratings yet
Pricing Strategy at Starbucks
6 pages
Gajski Vahid Book Ss Slides
No ratings yet
Gajski Vahid Book Ss Slides
216 pages
Submitted By: Huzaifa Sarfraz. Registration No: S1F18BSCS0014. Submitted To: Prof. Attia Muslim. Class: (Bscs Iv) - Subject: Data Structure
No ratings yet
Submitted By: Huzaifa Sarfraz. Registration No: S1F18BSCS0014. Submitted To: Prof. Attia Muslim. Class: (Bscs Iv) - Subject: Data Structure
5 pages
Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4
No ratings yet
Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4
11 pages
Computer Organization: Instruction Set Architecture
No ratings yet
Computer Organization: Instruction Set Architecture
148 pages
1990 Duncan Parallel Architectures
No ratings yet
1990 Duncan Parallel Architectures
12 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
55 pages
A Supercomputer Is A Computer That Is at The Frontline of Current Processing Capacity
No ratings yet
A Supercomputer Is A Computer That Is at The Frontline of Current Processing Capacity
17 pages
Usman
No ratings yet
Usman
3 pages
ECE 6913 Fall 2022 Syllabus1
No ratings yet
ECE 6913 Fall 2022 Syllabus1
4 pages
Operating Systems Structures: Jerry Breecher
100% (1)
Operating Systems Structures: Jerry Breecher
22 pages
CA Classes-216-220
No ratings yet
CA Classes-216-220
5 pages
Chapter 18: The Pentium and Pentium Pro Microprocessors
No ratings yet
Chapter 18: The Pentium and Pentium Pro Microprocessors
75 pages
Unit 4 - Parallel Computer Structures Word
No ratings yet
Unit 4 - Parallel Computer Structures Word
12 pages
Computer Architecture - CS252
No ratings yet
Computer Architecture - CS252
2 pages
CSE 330 My Exam Cheat Sheet PDF
No ratings yet
CSE 330 My Exam Cheat Sheet PDF
2 pages
Coa
No ratings yet
Coa
11 pages
William Stallings Computer Organization and Architecture 8 Edition
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition
55 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
23 pages
CS2354 Advanced Computer Architecture
No ratings yet
CS2354 Advanced Computer Architecture
14 pages
PowerPoint Slides To Chapter 07
No ratings yet
PowerPoint Slides To Chapter 07
49 pages
Chapter1 - Basic Structure of Computers
100% (1)
Chapter1 - Basic Structure of Computers
119 pages
Eecs112 hw1
No ratings yet
Eecs112 hw1
2 pages
Brief History of The X86 Family:: Evolution From 8080/8085 To 8086
No ratings yet
Brief History of The X86 Family:: Evolution From 8080/8085 To 8086
15 pages
Memory Organization
No ratings yet
Memory Organization
99 pages
Computer Architecture Questions
No ratings yet
Computer Architecture Questions
1 page
What Is CPU Scheduling?: Ready Queue To Be Executed. The Selection Process Is Carried Out by The Short-Term Scheduler (Or
No ratings yet
What Is CPU Scheduling?: Ready Queue To Be Executed. The Selection Process Is Carried Out by The Short-Term Scheduler (Or
4 pages
8086 Microprocessor
No ratings yet
8086 Microprocessor
100 pages
Features of 8086
No ratings yet
Features of 8086
31 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
36 pages
Unit-1 Introduction To Microprocessor Architecture PDF
No ratings yet
Unit-1 Introduction To Microprocessor Architecture PDF
15 pages
MP QB
No ratings yet
MP QB
19 pages
1-IAS Architecture-12-12-2022
No ratings yet
1-IAS Architecture-12-12-2022
34 pages
Introduction To Embedded Systems Byshibukv
No ratings yet
Introduction To Embedded Systems Byshibukv
36 pages
Systolic Array
No ratings yet
Systolic Array
42 pages
CS321 Computer Architecture
No ratings yet
CS321 Computer Architecture
160 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Sets: Addressing Modes and Formats
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Sets: Addressing Modes and Formats
47 pages

SIMD Presentation

Uploaded by

SIMD Presentation

Uploaded by

SIMD

Single instruction, multiple

 In the earliest computers, only one program ran at a time. A

 The computer would start an I/O operation, and while it was

 It describes computers with multiple processing

 There are simultaneous (parallel) computations, but

 Pipelined vector processors

You might also like