More On Pipelining
More On Pipelining
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Prefetch Buffers
Used to match the instruction fetch rate to the pipeline consumption rate In a single memory access, a block of consecutive instructions are fetched into a prefetch buffer Three types of prefetch buffers: Sequential buffers, used to store sequential instructions Target buffers, used to store branch target instructions Loop buffer, used to store loop instructions
Hazard Avoidance
Read/write of shared variables by different instructions in pipeline may lead to different results if instructions are executed out of order Types: Read after Write (RAW) Hazard Write after Write (WAW) Hazard Write after Read (WAR) Hazard
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Instruction Scheduling
Aim: To schedule instructions through an instruction pipeline Types of instruction scheduling: Static Scheduling
Supported by optimizing compiler
Dynamic Scheduling
Achieved by Tomasulos register-tagging scheme Using scoreboarding scheme
Static Scheduling
Data dependency in a sequence of instructions create interlocked relationships Interlocking can be resolved by compiler by increasing separation between interlocked instructions Example:
Two independent load instructions can be moved ahead so that spacing between them and multiply instruction is increased.
Tomasulos Algorithm
Hardware dependent scheme Data operands are saved in Register Station (RS) until dependencies get resolved Register tagging is used to allocate/deallocate register All working registers are tagged
Scoreboarding
Multiple functional units appear in multiple execution pipelines. Parallel units allow instruction to execute out of order w.r.t. original program sequence. Processor has instruction buffers, instructions are issues regardless of the availability of their operands. Centralized control units called scoreboard is used to keep track of unavailable operands for instructions stored in buffer
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Branching Illustrated
Ib: Branch Instruction Once branch taken is decided, all instructions are flushed Subsequently, all the instructions at branch target are run
Effect of Branching
Nomenclature:
Branch Taken, action of fetching non-sequential (remote) instructions after branch instruction Branch Target, (remote) instruction to be executed after branch taken Delay Slot (b), number of pipeline cycles consumed between branch taken and branch target In general, 0 <= b <= k-1 where k is number of pipeline stages
Effect of Branching
When branch taken occurs, all instruction after branch instruction become useless, pipeline is flushed, loosing number of cycles Let Ib be branch instruction, then branch taken shall cause all instructions from Ib+1 till Ib+k-1 to be drained from pipeline Let p be probability of instruction to be branch instruction and q be probability of branch taken, then penalty, in terms of time is expressed as Tpenalty = pqnbt , where n: number of instructions; b: number of pipeline cycles consumed; t: cycle time Effective execution time becomes T = kt + (n-1)t +
Branch Prediction
Branch can be predicted based on
Static Branch Strategy
Probability of branch with respect to a particular branch type can be used to predict branch Probability may be obtained by collecting frequency of branch taken and branch types across large number of program traces
Delayed Branches
Branch penalty can be reduced by the concept of delayed branch The central idea is to delay the execution of branch instruction to accommodate independent* instructions
Delaying by d cycles allows few useful instructions (independent*) of branch instructions to be executed * Execution of these instructions should be independent of outcome of branch instruction
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques
Dynamic pipeline
Performs multiple functions at the same time Care needs to be taken in sharing the pipeline
Pipeline Interconnections
Example: Advanced Scientific Computer Arithmetic pipeline has eight stages It is an example of static multifunctional pipeline With change in interconnections, different functions (fixed-point and floating point) can be performed