UNIT 3 Second Half Notes
UNIT 3 Second Half Notes
We can turn the pipelining speed-up discussion above into a formula. If the
stages are perfectly balanced, then the time between instructions on the pipelined
processor—assuming ideal conditions—is equal to
Pipelining improves performance by increasing instruction
throughput, as opposed to decreasing the execution time of an individual
instruction, but instruction throughput is the important metric because real
programs execute billions of instructions.
3.4.1 Designing Instruction Sets for Pipelining
First, all MIPS instructions are the same length. This restriction makes it
much easier to fetch instructions in the first pipeline stage and to decode them in
the second stage.
Second, MIPS has only a few instruction formats, with the source register
fields being located in the same place in each instruction. This symmetry means
that the second stage can begin reading the register file at the same time that the
hardware is determining what type of instruction was fetched. If MIPS instruction
formats were not symmetric, we would need to split stage 2, resulting in six
pipeline stages.
Third, memory operands only appear in loads or stores in MIPS. This
restriction means we can use the execute stage to calculate the memory address and
then access memory in the following stage.
Fourth, operands must be aligned in memory in MIPS. So, we need not
worry about a single data transfer instruction requiring two data memory accesses;
the requested data can be transferred between processor and memory in a single
pipeline stage.
In Figure 4.33, these five components correspond roughly to the way the datapath
is drawn; instructions and data move generally from left to right through the five
stages as they complete execution. There are, however, two exceptions to this left -
to-right flow of instructions:
The write-back stage, which places the result back into the register file in the
middle of the datapath
The selection of the next value of the PC, choosing between the incremented
PC and the branch address from the MEM stage
Data flowing from right to left does not affect the current instruction; these reverse
data movements influence only later instructions in the pipeline. Note that the first
right-to-left flow of data can lead to data hazards and the second leads to control
hazards.
The following diagram shows the pipelined datapath with the pipeline registers
highlighted. All instructions advance during each clock cycle from one pipeline
register to the next. The registers are named for the two stages separated by that
register.
For example, the pipeline register between the IF and ID stages is called
IF/ID. Notice that there is no pipeline register at the end of the write-back stage.
All instructions must update some state in the processor, the register file,
memory, or the PC.
For example, a load instruction will place its result in 1 of the 32 registers,
and any later instruction that needs that data will simply read the appropriate
register.
The pipeline registers separate each pipeline stage. They are labeled by the
stages that they separate; For example, the first is labeled IF/ID because it
separates the instruction fetch and instructions decode stages.
The registers must be wide enough to store all the data corresponding to the
lines that go through them. For example, the IF/ID register must be 64 bits wide,
because it must hold both the 32-bit instruction fetched from memory and the
incremented 32-bit PC address.
We highlight the right half of registers or memory when they are being read
and highlight the left half when they are being written.
1. Instruction fetch: The Figure shows the instruction being read from
memory using the address in the PC and then being placed in the IF/ID
pipeline register. The PC address is incremented by 4 and then written back
into the PC to be ready for the next clock cycle. This incremented address is
also saved in the IF/ID pipeline register in case it is needed later for an
instruction, such as beq.
2. Instruction decode and register file read: The Figure shows the instruction
portion of the IF/ID pipeline register supplying the 16-bit immediate field,
which is sign-extended to 32 bits, and the register numbers to read the two
registers. All three values are stored in the ID/EX pipeline register, along
with the incremented PC address. We again transfer everything that might be
needed by any instruction during a later clock cycle.
3. Execute or address calculation: Figure shows that the load instruction
reads the contents of register 1 and the sign-extended immediate from the
ID/EX pipeline register and adds them using the ALU. That sum is placed in
the EX/MEM pipeline register.
4. Memory access: The Figure shows the load instruction reading the data
memory using the address from the EX/MEM pipeline register and loading
the data into the MEM/WB pipeline register.
5. Write-back: The Figure shows the final step: reading the data from the
MEM/WB pipeline register and writing it into the register file in the middle
of the figure.
1. Forwarding
2. By using stall
3.8.1 FORWARDING
The last four instructions are all dependent on the result in register $2 of the first
instruction. If register $2 had the value 10 before the subtract instruction and −20
afterwards, the programmer intends that −20 will be used in the following
instructions that refer to register $2.
The desired result is available at the end of the EX stage or clock cycle 3.
When is the data actually needed by the AND and OR instructions? At the
beginning of the EX stage, or clock cycles 4 and 5, respectively. Thus, we can
execute this segment without stalls if we simply forward the data as soon as it is
available to any units that need it before it is available to read from the register file.
The primary solution is based on the observation that we don’t need to wait
for the instruction to complete before trying to resolve the data hazard.
Forwarding Also called bypassing. A method of resolving a data hazard by
retrieving the missing data element from internal buffers rather than waiting for it
to arrive from programmer visible registers or memory.
Figure 4.53 shows the dependences between the pipeline registers and the
inputs to the ALU for the same code sequence. The change is that the dependence
begins from a pipeline register, rather than waiting for the WB stage to write the
register file. Thus, the required data exists in time for later instructions, with the
pipeline registers holding the data to be forwarded.
If we can take the inputs to the ALU from any pipeline register rather than
just ID/EX, then we can forward the proper data. By adding multiplexors to the
input of the ALU, and with the proper controls.
Pipeline stall also called bubble. A stall initiated in order to resolve a hazard.
Load-use data hazard: A specific form of data hazard in which the data being
loaded by a load instruction has not yet become available when it is needed by
another instruction.
One case where forwarding cannot save the day is when an instruction tries
to read a register following a load instruction that writes the same register. Figure
4.58 illustrates the problem. The data is still being read from memory in clock
cycle 4 while the ALU is performing the operation for the following instruction.
Something must stall the pipeline for the combination of load followed by an
instruction that reads its result.
operates during the ID stage so that it can insert the stall between the load and its
use.
The following diagram shows the AND instruction is turned into a nop and all
instructions beginning with the AND instructions are delayed one cycle. In this
example, the hazard forces the AND and OR instructions to repeat in clock cycle 4
what they did in clock cycle 3: AND reads registers and decodes, and OR is
refetched from instruction memory.
In control hazards, assuming a branch is not taken means that the next
instruction is fetched and execution continues down the sequential instruction
stream. If the branch is taken, the instructions in the pipeline are discarded.
Discarding instructions, then, means we must be able to flush instructions in the
IF, ID, and EX stages of the pipeline. flush means to discard instructions in a
pipeline, usually due to an unexpected event.
Thus far, we have assumed the next PC for a branch is selected in the MEM
stage, but if we move the branch execution earlier in the pipeline, then fewer
instructions need be flushed.
Moving the branch decision up requires two actions to occur earlier:
computing the branch target address and evaluating the branch decision. The easy
part of this change is to move up the branch address calculation. We already have
the PC value in the IF/ID pipeline register, so we just move the branch adder from
the EX stage to the ID stage.
Adding hardware to compute the branch target address and evaluate the
branch decision in the ID stage reduces the number of stall (flush) cycles to one.
3.9.3 DYNAMIC BRANCH PREDICTION
This strategy uses recent branch history during program execution to predict
whether or not the branch will be taken next time when it occurs. It uses recent
branch information to predict the next branch. This technique is called dynamic
branch prediction- Prediction of branches at runtime using runtime information.
A branch prediction buffer or branch history table is a small memory
indexed by the lower portion of the address of the branch instruction. The memory
contains a bit that says whether the branch was recently taken or not.
1-Bit Branch Prediction
Simple Prediction: Uses a single bit to predict the outcome of a branch.
Initialization: Bit is initially set to a default value (often "not taken").
Prediction:
o If the bit is 1, the branch is predicted to be taken.
o If the bit is 0, the branch is predicted to be not taken.
Update:
o If the prediction is correct, the bit remains unchanged.
o If the prediction is incorrect, the bit is changed.
Handling Exception:
The two types of exceptions can occur in the basic MIPS architecture
implementation.
1. Execution of an undefined instruction
2. An arithmetic overflow.
When an exception occurs the processor saves the address of the ending instruction
in the exception program counter (EPC) and then transfer control to the operating
system at some specified address. The operating system then takes the appropriate
action, which may involve providing some service to the user program, taking
some predefined action in response to an overflow, or stopping the execution of the
program and reporting an error.
After performing whatever action is required because of the exception, the
operating system can terminate the program or may continue its execution, using
the EPC to determine where to restart the execution of the program.
Two main methods used to communicate the reason for an exception:
The first method used in the MIPS architecture is to include a status
register (called the Cause register), which holds a field that indicates
the reason for the exception.
A second method is to use vectored interrupts. In a vectored interrupt,
the address to which control is transferred is determined by the cause
of the exception.
We will need to add two additional registers to our current MIPS implementation:
EPC: A 32-bit register used to hold the address of the affected instruction.
Cause: A register used to record the cause of the exception. In the MIPS
architecture, this register is 32 bits, although some bits are currently unused.