Lecture # Pipelining

The document discusses the key operations of a processor, including R-type, load, store, branch, and jump instructions, along with their implementation in the datapath. It explains the inefficiencies of single-cycle implementations and introduces pipelining as a more efficient alternative that increases instruction throughput. Additionally, it covers pipeline hazards, including structural, data, and control hazards, and their solutions such as forwarding and branch prediction.

Uploaded by

adeenhassan7575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lecture # Pipelining

Uploaded by

adeenhassan7575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

The Processor

Computer Organization and Assembly Language

Key Operations:
•R-type (add, sub):
•Fetch, Decode, Execute (ALU), Write back.
•Load (lw):
•Fetch, Decode, ALU address calculation, Memory Read, Write back.
•Store (sw):
•Fetch, Decode, ALU address calculation, Memory Write.
•Branch (beq):
•Fetch, Decode, Compare (ALU Zero Check), Update PC if Zero.
•Jump (j):
•Fetch, Decode, Compute Jump Address, Update PC.
Building the Datapath
Implementing Jump Instruction
• Jump Instruction

• Jump vs Branch Instruction

• The jump instruction looks like a branch instruction but computes the target PC differently and is not
conditional
• How is the jump instruction implemented?
• Jump instruction has 26 bit of address. However, it is required to compute the PC to reach at Target
Address.
• Computation of Target Address
• The lower 28 bits of comes from address contained in jump instruction which has 26 bits.
• Those 26 bits are shifted left to make 28 bits.
• The upper 04 bits of PC remains same as of current value of PC+4
• Thus, jump instruction can be implemented by concatenation of upper bits of PC+4 and
Target
26:0 bits of address field after left shifting PC will result in 28 bits.
which
Upper 4-bits of Lower 26-bits of Jump Address 0
PC+4 0
Building the Datapath
Implementing Jump Instruction
• Let’s assume that given initial PC
value is actually 0x0008 0000
• That’s is to make 32-bits
• Now executing Jump instruction
• Current Value of PC = 0x0008 0024
• Address field of Jump = 0x0020000
• Shifted Left = 0x0020000*22 =
0x0080000
• Target Address = Upper 4-bits of PC + 28-bits after Left Shift
= 0b0000 + 0b0000 0000 1000 0000 0000 0000
0000
= 0b 0000 0000 0000 1000 0000 0000 0000
0000
= 0x 0008 0000
which is required target address
Building the Datapath
Implementing Jump Instruction (Fig.
Expl.)
• An additional multiplexor (at the upper right) is used to
choose between the jump target and either the branch
target or the sequential instruction following this one
• This multiplexor is controlled by the jump control signal
• The jump target address is obtained by shifting the
lower 26 bits of the jump instruction left 2 bits,
effectively adding 00 as the low-order bits, and then
concatenating the upper 4 bits of PC + 4 as the high-
order bits, thus yielding a 32-bit address.
Why a Single-Cycle Implementation Is Not
Used Today
• Although the single-cycle design will work correctly, it would not be
used in modern designs because it is inefficient
• Why single cycle is not used today?
• The clock cycle must have the same length for every instruction in this
single-cycle design
• And of course, the longest possible path in the processor determines the
clock cycle
• Which instruction has longest path or involve all function elements
of processor?
• Load Word instruction uses five functional units in series
• the instruction memory, the register file, the ALU, the data memory, and the register file
• Although the CPI is 1, the overall performance of a single-cycle
implementation is likely to be poor, since the clock cycle is too long
Why a Single-Cycle Implementation Is Not
Used Today
• The penalty for using the single-cycle design with a
fixed clock cycle is significant, but might be considered
acceptable for this small instruction set
• However, for floating-point unit or an instruction set with more
complex instructions, this single-cycle design wouldn’t work
well at all.
• Reason:
• Because we must assume that the clock cycle is equal to the worst-
case delay for all instructions
• It’s useless to try implementation techniques that reduce the delay of
the common case but do not improve the worst-case cycle time
Why a Single-Cycle Implementation Is Not
Used Today
• Solution:
• Pipelining uses the Datapath very similar to the single-cycle
Datapath but is much more efficient by having a much higher
throughput
• Pipelining improves efficiency by executing multiple
instructions simultaneously
• Pipelining is an implementation technique in which
multiple instructions are overlapped in execution
Pipelining
(Figure Explanation)
• Pipelining Example
• A lot of laundry has intuitively used pipelining
• However, the nonpipelined approach to laundry would be as
follows:
1. Place one dirty load of clothes in the washer.
2. When the washer is finished, place the wet load in the dryer.
3. When the dryer is finished, place the dry load on a table and fold.
4. When folding is finished, ask your roommate to put the clothes away.
Pipelining
Figure Explanation
• Ann, Brian, Cathy, and Don each have dirty clothes to
be washed, dried, folded, and put away
• The washer, dryer, “folder,” and “storer” each take 30
minutes for their task
• Sequential laundry takes 8 hours for 4 loads of wash, while
pipelined laundry takes just 3.5 hours
• We show the pipeline stage of different loads over time
by showing copies of the four resources on this two-
dimensional timeline, but we really have just one of
each resource.
Pipelining
• Why is pipelining faster?
• It is faster for many loads as everything is working in parallel
thus more loads are finished per hour
• What does pipelining not do?
• It would not decrease the time to complete one load of
laundry
• Pipelining improves throughput of our laundry system.
• However, when we have many loads of laundry to do, the
improvement in throughput means decrease in the total time
to complete the all laundry
Pipelining
• MIPS instructions classically take five steps:
1. Fetch instruction from memory.
2. Decode the instruction and read registers
• The regular format of MIPS instructions allows reading and decoding to
occur simultaneously.
3. Execute the operation or calculate an address.
4. Access an operand in data memory.
5. Write the result into a register.
Pipelining
Example
• Compare the average time between instructions of a single-
cycle implementation to a pipelined implementation
• Assumptions:
• The operation times for the major functional units are given below

• In the single-cycle model, every instruction takes exactly one

clock cycle, so the clock cycle must be stretched to accommodate
the slowest instruction.
Pipelining
Example
• Single Cycle Design
• The single-cycle design must allow for the slowest instruction,
and it is lw—so the time required for every instruction is 800 ps

• Just as the single-cycle design must take the worst-case clock

cycle of 800 ps, even though some instructions can be as fast as
500 ps
Pipelining
Example
• Pipelined Execution
• Pipelined execution clock cycle must have the worst-case clock
cycle of 200 ps, even though some stages take only 100 ps.

• Thus, Pipelining oﬀers a fourfold performance improvement

• The time between the first and fourth instructions is 3 × 200 ps or 600 ps
Pipelining
• What would happen if we increased the number of
instructions?
• Extend the previous figures to 1,000,003 instructions
• We would add 1,000,000 instructions in the pipelined example
• each instruction adds 200 ps to the total execution time
• Total execution = 1,000,000 × 200 ps + 1400 ps = 200,001,400 ps
• In the nonpipelined example
• we would add 1,000,000 instructions, each taking 800 ps
• Total execution time = 1,000,000 × 800 ps + 2400 ps =
800,002,400 ps
• Under these conditions
• Ratio of total execution times of pipelines and nonpipelined
Pipelining
• Pipelining improves performance by increasing
instruction throughput, as opposed to decreasing the
execution time of an individual instruction, but
instruction throughput is the important metric because
real programs execute billions of instructions.
Pipeline Hazards
• There are situations in pipelining when the next
instruction cannot be executed in the following clock
cycle
• These events are called hazards, and there are three
diﬀerent types.
1. Structural Hazard
2. Data Hazards
3. Control Hazards
Pipeline Hazards
Structural Hazard
• Structural hazard means that the hardware cannot
support the combination of instructions that we want to
execute in the same clock cycle
• For example,
• A structural hazard in the laundry room would occur if we used
a combined washer-dryer unit instead of a separate washer
and dryer machine
• or if our roommate was busy doing something else and
wouldn’t put clothes away
Pipeline Hazards
Structural Hazard
• Suppose, we had a single memory unit (combined instruction and
data memory)
• If there is a fourth instruction, in the same clock cycle the
• first instruction is accessing data from memory
• fourth instruction is fetching an instruction from that same memory
• Without two memories, our pipeline could have a structural hazard.
Pipeline Hazards
Data Hazard
• Data hazards occur when the pipeline must be stalled
because one step must wait for another to complete
• For example,
• You found a sock at the folding station for which no match
existed. One possible strategy is to run down to your room
and search through your clothes bureau to see if you can find
the match
• Obviously, while you are doing the search, loads must wait
that have completed drying and are ready to fold as well as
those that have finished washing and are ready to dry
Pipeline Hazards
Data Hazard
• In a computer pipeline, data hazards arise from the
dependence of one instruction on an earlier one that is still in
the pipeline
• Problem
• An Add instruction followed immediately by a subtract instruction
that uses the sum ($s0):
add $s0, $t0, $t1
sub $t2, $s0, $t3
• Without intervention, a data hazard could severely stall the pipeline
• The add instruction doesn’t write its result until the fifth stage,
meaning that we would have to waste three clock cycles in the
pipeline.
Pipeline Hazards
Data Hazard
• Solution
• There is no need to wait for the complete instruction execution
• Adding extra hardware to retrieve the missing item early from
the internal resources is called forwarding or bypassing
• For the code sequence
add $s0, $t0, $t1
sub $t2, $s0, $t3
• As the ALU creates the sum for the add, we can supply it as an input for
the subtract
Pipeline Hazards
Data Hazard
Pipeline Hazards
Data Hazard
• Valid Case:
• Forwarding paths are valid only if the destination stage is later in time than
the source stage

• Invalid Case:
• There cannot be a valid forwarding path from the output of the memory
access stage in the first instruction to the input of the execution stage of
the following, since that would mean going backward in time
Pipeline Hazards
Data Hazard For the code sequence
lw $s0, 20($t1)
sub $t2, $s0, $t3
• For example,
• The first instruction was a load of $s0 instead of an add.
• Data value will only be available only after the fourth stage which is too late for forwarding
• Hence, even with forwarding, we would have to stall one stage for a load-use data
hazard officially called a pipeline stall, but often given the nickname bubble.
Pipeline Hazards
Control Hazards
• Control hazard comes arises when there is need to make a
decision based on the results of one instruction while
others are executing
• For example, Branch Instruction
• The instruction next to branch instruction must be fetched on next
clock cycle
• Because the pipeline cannot possibly know what the next instruction should
be. Since it only just received the branch instruction from memory
• The decision is still to be made based on the operands of branch instruction
• There will be pipeline stall immediately after we fetch a branch
• Because waiting until the pipeline determines the outcome of the branch
and knows what instruction address to fetch from
Pipeline Hazards
Control Hazards
• If we cannot resolve the branch in the second stage
• As is often the case for longer pipelines, then we’d see an
even larger slowdown if we stall on branches
• Consequences:
• The cost of this option is too high for most computers to use
• Second Solution
• Predict: Computers do indeed use prediction to handle
branches
• One simple approach is to predict always that branches will be untaken
• When you’re right, the pipeline proceeds at full speed
• Only when branches are taken does the pipeline stall
Pipeline Hazards
Control Hazards
• A more sophisticated version of branch prediction would
have some branches predicted as taken and some as
untaken
• In the case of programming, at the bottom of loops are
branches that jump back to the top of the loop
• Since they are likely to be taken and they branch backward,
we could always predict taken for branches that jump to an
earlier address
Pipeline Hazards
Control Hazards
• Third solution: Delayed branch in computers are actually used by the MIPS
architecture to deal branches.
• The delayed branch always executes the next sequential instruction, with
the branch taking place after that one instruction delay.
• It is hidden from the MIPS assembly language programmer because the assembler
can automatically arrange the instructions to get the branch behavior desired by
the programmer.
• MIPS software will place an instruction immediately after the delayed branch
instruction that is not affected by the branch, and a taken branch changes the
address of the instruction that follows this safe instruction.
• In our example, the add instruction before the branch in Figure 4.31 does
not affect the branch and can be moved after the branch to fully hide the
branch delay. Since delayed branches are useful when the branches are
short, no processor uses a delayed branch of more than one cycle. For
longer branch delays, hardware-based branch prediction is usually used.
Pipeline Hazards
Control Hazards
• In Delayed branch example
• add instruction before the branch does not affect the branch and can be
moved after the branch to fully hide the branch delay
• Since delayed branches are useful when the branches are short, no processor
uses a delayed branch of more than one cycle
• For longer branch delays, hardware-based branch prediction is usually used

Marketing Management-Ii Case Study On Compaq in Crisis
No ratings yet
Marketing Management-Ii Case Study On Compaq in Crisis
5 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
1. Lecture 13 Pipelining
No ratings yet
1. Lecture 13 Pipelining
12 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Lec 1
No ratings yet
Lec 1
30 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
8 Pipeline Ddp Control
No ratings yet
8 Pipeline Ddp Control
54 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Pipeline
No ratings yet
Pipeline
39 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
module 4-Pipelining
No ratings yet
module 4-Pipelining
39 pages
Pipelining and parallel processing
No ratings yet
Pipelining and parallel processing
26 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
ACA Unit 2,7th Sem CSE
No ratings yet
ACA Unit 2,7th Sem CSE
13 pages
3-Pipelining_241110_203716
No ratings yet
3-Pipelining_241110_203716
59 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
CO Pipelining PDF notes
No ratings yet
CO Pipelining PDF notes
10 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
06- CS F342 Pipelining(ForMIDSEM_upto35slides)
No ratings yet
06- CS F342 Pipelining(ForMIDSEM_upto35slides)
69 pages
Module 5 Part2 pipelining
No ratings yet
Module 5 Part2 pipelining
36 pages
Unit 5
No ratings yet
Unit 5
51 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
No ratings yet
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
27 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
chapter4_2
No ratings yet
chapter4_2
34 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
Module 4
No ratings yet
Module 4
12 pages
Pipelining
No ratings yet
Pipelining
43 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
25 pages
Pipe Lining
No ratings yet
Pipe Lining
29 pages
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
No ratings yet
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
20 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Computer Architecture: Appendix A Pipelining Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Appendix A Pipelining Prof. Jerry Breecher CSCI 240 Fall 2003
58 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
4 20 10 PDF
No ratings yet
4 20 10 PDF
12 pages
Computer Architecture: Pipelining Khiyam Iftikhar
No ratings yet
Computer Architecture: Pipelining Khiyam Iftikhar
36 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
60 pages
Lecture 1
100% (1)
Lecture 1
10 pages
Pipeline
No ratings yet
Pipeline
33 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
The Holodeck: A Specification
From Everand
The Holodeck: A Specification
Michael Cloran
No ratings yet
Touchpad Play Ver 2.0 Class 5: Windows 10 & MS Office 2016
From Everand
Touchpad Play Ver 2.0 Class 5: Windows 10 & MS Office 2016
Team Orange
No ratings yet
Industrial Cases in Simulation Modeling
From Everand
Industrial Cases in Simulation Modeling
James A. Chisman PhD
No ratings yet
Comptia Server+ Primer
From Everand
Comptia Server+ Primer
John Greene
5/5 (1)
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
D.K.T.E. Society's Yashwantrao Chavan Polytechnic, Ichalkaranji
No ratings yet
D.K.T.E. Society's Yashwantrao Chavan Polytechnic, Ichalkaranji
12 pages
STDF-2: Service Manual
No ratings yet
STDF-2: Service Manual
47 pages
5 Ultrasonic
No ratings yet
5 Ultrasonic
50 pages
BPM CRS Operation Manual Completo
No ratings yet
BPM CRS Operation Manual Completo
49 pages
ComX 510 User Manual
No ratings yet
ComX 510 User Manual
180 pages
cMT-FHDX-820_Datasheet_ENG_20241029
No ratings yet
cMT-FHDX-820_Datasheet_ENG_20241029
2 pages
Introduction To Computing Prelim Answer 1
0% (1)
Introduction To Computing Prelim Answer 1
77 pages
JDBC Mysql
No ratings yet
JDBC Mysql
3 pages
Dry Gas Seals Manual
100% (6)
Dry Gas Seals Manual
31 pages
Web Browser: A Presentation About
No ratings yet
Web Browser: A Presentation About
28 pages
Denon-DCD-710AE
No ratings yet
Denon-DCD-710AE
76 pages
CBB1443 How To Timber Deck
No ratings yet
CBB1443 How To Timber Deck
2 pages
Solution Methods For Eigenvalue Problems in Structural Mechanics
No ratings yet
Solution Methods For Eigenvalue Problems in Structural Mechanics
14 pages
4 Semester: B.Tech (Computer Science and Engineering) Syllabus For Admission Batch 2015-16
No ratings yet
4 Semester: B.Tech (Computer Science and Engineering) Syllabus For Admission Batch 2015-16
12 pages
Oralce D2k Interview Questions
No ratings yet
Oralce D2k Interview Questions
37 pages
Raspberry Pi: A Slice For Education
100% (1)
Raspberry Pi: A Slice For Education
17 pages
Mitsubishi fx2n-32ccl Manual - JY992D93201c
No ratings yet
Mitsubishi fx2n-32ccl Manual - JY992D93201c
8 pages
Interrupts (2)
No ratings yet
Interrupts (2)
14 pages
Manual Srp-350ii Ig Usaeuro Rev 1 5
No ratings yet
Manual Srp-350ii Ig Usaeuro Rev 1 5
8 pages
Gps Project Report
No ratings yet
Gps Project Report
82 pages
Intelligent Stick I820: USB Flash Drive
No ratings yet
Intelligent Stick I820: USB Flash Drive
7 pages
Bluetooth 2 Click Schematic v102
No ratings yet
Bluetooth 2 Click Schematic v102
1 page
8051 Question and Answer Bank
No ratings yet
8051 Question and Answer Bank
15 pages
Linux Basics
100% (3)
Linux Basics
37 pages
SCP
No ratings yet
SCP
42 pages
375 Cancellation IRM - PELJ2110 - V00 - Anonymous
No ratings yet
375 Cancellation IRM - PELJ2110 - V00 - Anonymous
6 pages
9000SRM1112 - Memory Integrity Fault (MEMORY INTEGRITY FAULT)
No ratings yet
9000SRM1112 - Memory Integrity Fault (MEMORY INTEGRITY FAULT)
5 pages
Reate: USB Flash Disk Controller Data Sheet
No ratings yet
Reate: USB Flash Disk Controller Data Sheet
8 pages
Template Set - Asset Management
No ratings yet
Template Set - Asset Management
19 pages

Lecture # Pipelining

Uploaded by

Lecture # Pipelining

Uploaded by

The Processor

Computer Organization and Assembly Language

• Jump vs Branch Instruction

• In the single-cycle model, every instruction takes exactly one

• Just as the single-cycle design must take the worst-case clock

• Thus, Pipelining oﬀers a fourfold performance improvement

You might also like