Pipelines - #1 RISC ISA Without Pipe
Pipelines - #1 RISC ISA Without Pipe
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
2. Load-Store instructions Operand (address) (Effective) Address Calculation Load register detailing Base Register AND (sign extended) offset Offset + base register content destination for loaded data. Store register detailing data for storage. 3. Branch or jump instructions Comparison Operand Register to register, or Register to zero Test Specified by opcode Destination ID Add sign extended offset to PC Operand (register)
NOTE RTL and tables summarize example format, but should not be considered as a practical pipelined architecture!
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
3.
One of 4 cases depending on the ID stage, (a) Memory Reference, ALU(out) A + Imm (b) Register-Register ALU operation, ALU(out) A opcode B (c) Register Immediate ALU instruction, ALU(out) A opcode Imm (d) Branch, ALU(out) NPC + Imm Cond (A opcode 0)
4.
PC NPC One of either, (a) Memory reference, LMD Mem[ALU(out)] Mem[ALU(out)] B (b) Branch, IF (cond) THEN PC ALU(out) ; or
5.
Write-back (WB)
In the case of a register ALU operation OR a load instruction we need to update the register contents to reflect the new values. (a) Register-Register ALU instruction, Reg[target] ALUout (b) Register-Immediate ALU instruction, Reg[target] ALUout (c) Load Instruction, Reg[target] LMD
Figure 1 summarizes the 5 stages. Note (a) PC and registers are shown in stages corresponding to their read position.
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
(b) Multiplexers indicate location of write for PC and registers AND the temporal dependency in this decision. (c) Backward flowing dependency will cause problems for efficient operation of any pipelining.
Figure 1: MIPS RISC data path. Comments, Branch and Store complete in 4 cycles; All other instruction types complete in 5 cycles; ALU instructions idle in MEM stage, so an optimal implementation (without pipelining) would complete ALU instructions here. Two ALUs would not be necessary if we where not interested in basing our pipeline implementation on this. Registers for storing intermediate results would also not be necessary.
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
Figure 1 indicates that the following pipeline hazards exist, o Structural or Resource Hazards Hardware cannot be shared between different stages (two ALUs in figure 1 to avoid this). o Data Hazards operation of one instruction dependent on the result from the previous instruction. Why is this not permissible? contents of PC changed by a branch or jump instruction. Why might this be a problem? o Control Hazards
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
Figure 2: Generic 5-stage pipe. o PC needs to change every clock cycleimplications? Figure 1 indicates that: Separate ALU necessary for NPC calculation AND branch offset calculation; Dont know the result of branch until 3 stages later! o Necessary to save the complete state of each instruction as it progresses between different pipeline stage. Pipeline registers provide this, as per figure 3;
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
Figure 3: Pipeline registers isolate each stage by propagating the value of local registers as each instruction passes through the pipe.
Summary
RISC ISA provided a suitable framework for understanding instruction pipelines leading their widespread introduction into micro-processor chip sets in the 1980s. Instruction pipelines o force a series of constraints on the ISA; o result in duplicated hardware between stages; o very sensitive to branch type instructions.
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
Example #1
With respect to the nave 5-stage RISC piped and unpiped architectures, let, Clock period be 1ns; ALU and branch instructions take 4 cycles; Memory operations take 5 cycles; Frequency of ALU instruction is 40%; Frequency of branch instruction is 20%; Frequency of memory instruction is 40%; Additional complexity of a pipeline adds 0.2ns to the clock period; 1. Why do ALU and branch instructions complete one clock cycle faster than memory instructions? 2. Assuming a full pipe and no hazards, how much speed up does the piped architecture provide with respect to the architecture without a pipe?
Question:
Example #2
Consider the case of an architecture in which separate instruction and data buses do not exist. With respect to figure 4, a data load at instruction #0 causes a structural hazard with the instruction fetch at instruction #1. The instruction being fetched is therefore stalled delaying instruction #3 and all following instructions by one clock cycle and increasing the CPI of the pipeline, figure 5. Let the following conditions hold, CPI of the pipe without stalls is 1; Data load/ store instructions constitute 40% of the instruction mix; Frequency of the pipe encountering stalls is 1.05 times higher than the pipe without stalls. 1. What is the speedup of the pipe that avoids stalls? 2. If such a stall is so detrimental to the operation of a pipeline, are there any conditions under which structural hazards of this nature might be permitted?
Questions:
Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining
Figure 4: Effect of single port memory on generic 5-stage pipe. Instruction 1 i i+1 i+2 i+3 i+4 IF 2 ID IF 3 EX ID IF 4 MEM EX ID stall Clock 5 WB MEM EX IF WB MEM ID IF WB EX ID MEM EX WB MEM 6 7 8 9