0% found this document useful (0 votes)

12 views33 pages

Week 11

Uploaded by

आस्तिक शर्मा

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views33 pages

Week 11

Uploaded by

आस्तिक शर्मा

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Computer Architecture

EE-371/CS-330
Spring 2019

Hasan Baig
[email protected]

Habib University

The contents in these lecture slides are prepared with a help of the official lecture slides of the book “Computer Organization and Design –
RISC V edition” by Patterson and Hennessy .
2
Recap
3
Performance Issues

• Longest delay determines clock period

Buffet/Delivery Restaurant Delivery Address /

Goodies collection
1. Dine-in, Delivery, worker 10

Process execute one at a

time
Tasks: Cash Counter 2

1. Customer – Grab food and dine in

2. Delivery Guy – Deliver food
3. Worker – Purchase Groceries Kitchen
(Grab food) 2
(Dining Hall)
(Check groceries)

Take/Give Order 1

Token counter 1
5
Pipelining Laundry Analogy
An implementation technique in which multiple instructions are
overlapped in execution

 Four loads:
 Speedup
= 8/3.5 = 2.3
 Non-stop:
 Speedup
= 2n/(0.5n + 1.5) ≈ 4
= number of stages
6
Pipelining
Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
7
Pipelining Example

• Assume time for stages is

– 100ps for register read or write
– 200ps for other stages
• Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register ALU op Memory Register Total time

read access write
ld 200ps 100 ps 200ps 200ps 100 ps 800ps
sd 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
8
Pipelining Example
Instr Instr fetch Register ALU op Memory Register Total time
read access write
ld 200ps 100 ps 200ps 200ps 100 ps 800ps
sd 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

Single-cycle (Tc= 800ps)

9
Pipelining Example
Instr Instr fetch Register ALU op Memory Register Total time
read access write
ld 200ps 100 ps 200ps 200ps 100 ps 800ps
sd 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

Pipelined (Tc= 200ps)

10
Pipelining Speedup

• If all stages are balanced

– i.e., all take the same time
– Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
• If not balanced, speedup is less
• Speedup due to increased throughput
– Latency (time for each instruction) does not
decrease
11
Pipelining Speedup

T = 2400

T = 1400
12
Pipelining Speedup
Instructions = 1,000,000

Pipelined Non-Pipelined
Each instruction will add 200 ps Each instruction will add 800 ps

Total time = 1,000,000 x 200 = 200000000 + 1400 Total time = 1,000,000 x 800 = 800000000 + 2400
= 200001400 = 800002400
13
Latency Exercise
14
Latency Solultion

1. R-type: 30 + 250 + 150 + 25 + 200 + 25 + 20 = 700ps

2. ld: 30 + 250 + 150 + 25 + 200 + 250 + 25 + 20 = 950 ps

3. sd: 30 + 250 + 150 + 200 + 25 + 250 = 905

4. beq: 30 + 250 + 150 + 25 + 200 + 5 + 25 + 20 = 705

5. I-type: 30 + 250 + 150 + 25 + 200 + 25 + 20 = 700ps

15
Recap Problems in single-cycle processor

• Longest delay determines clock period

– Critical path: load instruction
– Instruction memory  register file  ALU  data
memory  register file
• Not feasible to vary period for different
instructions
• Violates design principle
– Making the common case fast
• We will improve performance by pipelining
16
Recap
17
Quick Review Stages in Processor
Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
18
Recap

• Data Read in the second half of clock cycle

• Data Write in the first half of clock cycle
19
Pipelining and ISA Design

• RISC-V ISA designed for pipelining

– All instructions are 32-bits
• Easier to fetch and decode in one cycle
• c.f. x86: 1- to 17-byte instructions

– Few and regular instruction formats

• Can decode and read registers in one step
20
Pipelining and ISA Design

• Situations that prevent starting the next

instruction in the next cycle  Hazards
• Structure hazards
– A required resource is busy
• Data hazard
– Need to wait for previous instruction to complete
its data read/write
• Control hazard
– Deciding on control action depends on previous
instruction
21
Pipelining Structural Hazards

When a planned instruction cannot execute in the proper clock cycle

because the hardware does not support the combination of
instructions that are set to execute.
• Conflict for use of a resource
• In RISC-V pipeline with a single memory
– Load/store requires data access
– Instruction fetch would have to stall for that cycle
• Would cause a pipeline “bubble”
• Hence, pipelined datapaths require separate
instruction/data memories
22
Pipelining Data Hazards

Data hazards occur when the pipeline must be stalled because one
step must wait for another to complete.
add x19, x0, x1
sub x2, x19, x3
23
Pipelining Data Hazards

• Use result when it is computed

– Don’t wait for it to be stored in a register
– Requires extra connections in the datapath

Forwarding - Also called bypassing. A method of resolving a data hazard by retrieving the
missing data element from internal buffers rather than waiting for it to arrive from
programmer- visible registers or memory.
25
Pipelining Data Hazards

• Forwarding paths are valid only if the destination stage is

later in time than the source stage
• Source – Output of MEM in first instruction
• Destination - Input to EX stage
26
Pipelining Data Hazards

• Load-use Data Hazards

• Can’t always avoid stalls by forwarding
– If value not computed when needed
– Can’t use forwarding backward in time!
27
Pipelining Data Hazards

Code Scheduling to avoid stalls

• Reorder code to avoid use of load result in the
next instruction
• C code for a = b + e; c = b + f;
Assume all variables are in memory and accessible from the offset x31

ld x1, 0(x31) ld x1, 0(x31)

ld x2, 8(x31) ld x2, 8(x31)
stall
add x3, x1, x2 ld x4, 16(x31)
sd x3, 24(x31) add x3, x1, x2
ld x4, 16(x31) sd x3, 24(x31)
add x5, x1, x4 add x5, x1, x4
stall
sd x5, 32(x31) sd x5, 32(x31)
13 cycles 11 cycles
28
Pipelining Control Hazards

• Also called Branch Hazard

• Branch determines flow of control
– Fetching next instruction depends on branch
outcome
– Pipeline can’t always fetch correct instruction
• Still working on ID stage of branch

• In RISC-V pipeline
– Need to compare registers and compute target
early in the pipeline
– Add hardware to do it in ID stage
29
Pipelining Control Hazards

Stall on branch
• Wait until branch outcome determined before
fetching next instruction
30
Pipelining Control Hazards

Branch Predictions
• Longer pipelines can’t readily determine
branch outcome early
– Stall penalty becomes unacceptable
• Predict outcome of branch
– Only stall if prediction is wrong
• In RISC-V pipeline
– Can predict branches not taken
– Fetch instruction after branch, with no delay
31
Pipelining Control Hazards

Branch Predictions
32
Pipelining Control Hazards

More-Realistic Branch Predictions

• Static branch prediction
– Based on typical branch behavior
– Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken
• Dynamic branch prediction
– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history
33
Pipelining Summary of Overview

• Pipelining improves performance by increasing

instruction throughput
– Executes multiple instructions in parallel
– Each instruction has the same latency
• Subject to hazards
– Structure, data, control
• Instruction set design affects complexity of
pipeline implementation
34
Pipelining Activity

For each code sequence below, state whether it must stall, can avoid stalls using
only forwarding, or can execute without stalling or forwarding.

SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
No ratings yet
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
114 pages
Advanced Pipelining for CE Students
No ratings yet
Advanced Pipelining for CE Students
43 pages
Week 10
No ratings yet
Week 10
12 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Lecture # 8B
No ratings yet
Lecture # 8B
20 pages
Chapter - 04 RISC V
No ratings yet
Chapter - 04 RISC V
132 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Chapter 10 Principles of Pipelining
No ratings yet
Chapter 10 Principles of Pipelining
124 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
50 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
44 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Module 2
No ratings yet
Module 2
64 pages
Computer System Organization
No ratings yet
Computer System Organization
26 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Chapter 04 RISC V Removed
No ratings yet
Chapter 04 RISC V Removed
99 pages
Design of 3 Stage Pipelining Processor Using VHDL
No ratings yet
Design of 3 Stage Pipelining Processor Using VHDL
22 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
IT3030E CA Chap5 CPU - Removed
No ratings yet
IT3030E CA Chap5 CPU - Removed
26 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
Ca 5
No ratings yet
Ca 5
12 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Lecture Notes Pipelining Stages 7B
No ratings yet
Lecture Notes Pipelining Stages 7B
7 pages
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
No ratings yet
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
19 pages
Lec14 Pipeline Riscv - Key
No ratings yet
Lec14 Pipeline Riscv - Key
58 pages
CPU Pipelining and Cache Basics
No ratings yet
CPU Pipelining and Cache Basics
61 pages
Risc in Pipe Ine
No ratings yet
Risc in Pipe Ine
39 pages
2.pipeline RISC-V v2
No ratings yet
2.pipeline RISC-V v2
47 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
2025 03 21 Clase 3 Teórico-2
No ratings yet
2025 03 21 Clase 3 Teórico-2
18 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
CH7-Parallel and Pipelined Processing
No ratings yet
CH7-Parallel and Pipelined Processing
23 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
Computer Systems Pipelining Guide
No ratings yet
Computer Systems Pipelining Guide
7 pages
Implementation of Multi Stage Processor
No ratings yet
Implementation of Multi Stage Processor
16 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
L04 Pipelining
No ratings yet
L04 Pipelining
48 pages
CO4 PPT Modified
No ratings yet
CO4 PPT Modified
35 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Week 1
No ratings yet
Week 1
34 pages
A Survey of FPGA Based Accelerators For
No ratings yet
A Survey of FPGA Based Accelerators For
32 pages
Semiconductors - JEE Main 2022 Chapter Wise Questions by MathonGo
No ratings yet
Semiconductors - JEE Main 2022 Chapter Wise Questions by MathonGo
21 pages
Work Power Energy - JEE Main 2022 Chapter Wise Questions by MathonGo
No ratings yet
Work Power Energy - JEE Main 2022 Chapter Wise Questions by MathonGo
17 pages
AYJR 2022 July - Shift 2
No ratings yet
AYJR 2022 July - Shift 2
70 pages
Napolan National High School S FINAL
No ratings yet
Napolan National High School S FINAL
3 pages
R8C25 Datasheet
No ratings yet
R8C25 Datasheet
59 pages
Class 7 Term 2 Computer
No ratings yet
Class 7 Term 2 Computer
3 pages
BASIC Scientific Subroutines
No ratings yet
BASIC Scientific Subroutines
317 pages
Address Translation
No ratings yet
Address Translation
16 pages
Second Mastery Test in CS 7
No ratings yet
Second Mastery Test in CS 7
1 page
Obn Internship
No ratings yet
Obn Internship
29 pages
Indent Laptop
No ratings yet
Indent Laptop
6 pages
Transmeta Crusoe: A Revolutionary CPU For Mobile Computing Ashraful Alam
No ratings yet
Transmeta Crusoe: A Revolutionary CPU For Mobile Computing Ashraful Alam
23 pages
MC68000 Logic States & Assembly Tasks
No ratings yet
MC68000 Logic States & Assembly Tasks
3 pages
50 Hours of Origami VOG 1 - Ruby Book Origami
100% (2)
50 Hours of Origami VOG 1 - Ruby Book Origami
200 pages
Merged
No ratings yet
Merged
53 pages
ITC316 Presentation 1
No ratings yet
ITC316 Presentation 1
13 pages
Unit-3Microprogram Sequencing
No ratings yet
Unit-3Microprogram Sequencing
5 pages
Siemens Modbus RTU Ladder Logic Guide
No ratings yet
Siemens Modbus RTU Ladder Logic Guide
15 pages
Room Assignment - SW 2024 - Davao
No ratings yet
Room Assignment - SW 2024 - Davao
169 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
Computer Lab: How To Install Windows
No ratings yet
Computer Lab: How To Install Windows
12 pages
Logitech MK220
No ratings yet
Logitech MK220
16 pages
Enea Checklist System Arch Review Version 1.0
No ratings yet
Enea Checklist System Arch Review Version 1.0
3 pages
Seriales
No ratings yet
Seriales
6 pages
Features of Fifth Generation of Computers
No ratings yet
Features of Fifth Generation of Computers
5 pages
E11777 H110M-C2 UM Web Only 20170712
No ratings yet
E11777 H110M-C2 UM Web Only 20170712
26 pages
8051 Microcontroller (Final) (W)
No ratings yet
8051 Microcontroller (Final) (W)
65 pages
Setup Serial Console with HyperTerminal
No ratings yet
Setup Serial Console with HyperTerminal
4 pages
Accelerating Scientific Computing with GPUs
100% (2)
Accelerating Scientific Computing with GPUs
96 pages
Unit 5 - NEW
No ratings yet
Unit 5 - NEW
20 pages
Compiler Construction
No ratings yet
Compiler Construction
40 pages
Cray XK6 Brochure
No ratings yet
Cray XK6 Brochure
6 pages
Canon Pixma Mx870 SM
No ratings yet
Canon Pixma Mx870 SM
66 pages

Week 11

Uploaded by

Week 11

Uploaded by

Computer Architecture

• Longest delay determines clock period

Buffet/Delivery Restaurant Delivery Address /

Process execute one at a

1. Customer – Grab food and dine in

• Assume time for stages is

Instr Instr fetch Register ALU op Memory Register Total time

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

• If all stages are balanced

1. R-type: 30 + 250 + 150 + 25 + 200 + 25 + 20 = 700ps

2. ld: 30 + 250 + 150 + 25 + 200 + 250 + 25 + 20 = 950 ps

3. sd: 30 + 250 + 150 + 200 + 25 + 250 = 905

4. beq: 30 + 250 + 150 + 25 + 200 + 5 + 25 + 20 = 705

5. I-type: 30 + 250 + 150 + 25 + 200 + 25 + 20 = 700ps

• Longest delay determines clock period

• Data Read in the second half of clock cycle

• RISC-V ISA designed for pipelining

– Few and regular instruction formats

• Situations that prevent starting the next

When a planned instruction cannot execute in the proper clock cycle

• Use result when it is computed

• Forwarding paths are valid only if the destination stage is

• Load-use Data Hazards

Code Scheduling to avoid stalls

ld x1, 0(x31) ld x1, 0(x31)

• Also called Branch Hazard

More-Realistic Branch Predictions

• Pipelining improves performance by increasing

You might also like