0% found this document useful (0 votes)
17 views10 pages

Coa Unit 4

ntg

Uploaded by

saib12830
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

Coa Unit 4

ntg

Uploaded by

saib12830
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit 04

Instruction Pipeline :
• In this a stream of instructions can be executed by overlapping fetch,
decode and execute phases of an instruction cycle.
• This type of technique is used to increase the throughput of the
computer system, An instruction pipeline reads instruction from the
memory while previous instructions are being executed in other segments
of the pipeline.
• Thus we can execute multiple instructions simultaneously. The pipeline
will be more efficient if the instruction cycle is divided into segments of
equal duration.
• In general case computer needs to process each instruction in following
sequence of steps:
1. Fetch the instruction from memory (FI)
2. Decode the instruction (DA)
3. Calculate the Effective Address
4. Fetch the operands from memory (FO)
5. Execute the instruction (EX)
6. Store the result in the proper place
The flowchart for instruction pipeline
example:

• Here the instruction is fetched on first clock cycle in segment 1.


Now it is decoded in next clock cycle, then operands are fetched and
finally the instruction is executed.
• Here the fetch and decode phase overlap due to pipelining. By the time
the 1st instruction is being decoded, 2nd instruction is fetched by the
pipeline.
• In case of 3rd instruction we it is a branched instruction and when it is
being decoded 4th instruction is fetched simultaneously. But as it is a
branched instruction it may point to some other instruction when it is
decoded.
• Thus 4th instruction is kept on hold until the branched instruction is
executed. When it gets executed then the 4th instruction is copied back
and the other phases continue as usual.

Arithmetic Pipeline :

• An arithmetic pipeline divides an arithmetic problem into various sub


problems for execution in various pipeline segments.
• It is used for floating point operations, multiplication and various other
computations.
• The process or flowchart arithmetic pipeline for floating point addition is
shown .
Floating
point
addition
using

arithmetic pipeline :
The following sub operations are performed in this case:
1.Compare the exponents.
2.Align the mantissas.
3.Add or subtract the mantissas.
4.Normalise the result
First of all the two exponents are compared and the larger of two exponents is
chosen as the result exponent.The difference in the exponents then decides how
many times we must shift the smaller exponent to the right. Then after shifting
of exponent, both the mantissas get aligned. Finally the addition of both
numbers take place followed by normalisation of the result in the last segment.
Example 1:
Let us consider two numbers,
X=0.3214*10^3 and Y=0.4500*10^2

Explanation:
Difference of the components 3 - 2=1.
Thus 3 becomes the exponent of result and the smaller exponent is shifted 1
times to the right to give
Y=0.0450*10^3

finally the two numbers are added to produce


Z=0.3664*10^3

Example 2:
X=0.9504*10^3 and Y=0.8200*10^2

Difference of the components 3 - 2=1 ,3 becomes the exponent of result and the
smaller exponent is shifted 1 time to right to give
Y=0.08200*10^3
finally the two numbers are added to produce Z=1.0324*10^3
RISC PIPELINE :
• RISC Instructions are Simple
• Memory Operations are limited to Load and Store Operations
• Almost all instructions can be executed in Single Clock cycle
• These characteristics are useful in creating an effective instruction
pipeline for RISC
Three -Segment Instruction Pipeline
I : Instruction fetch
A : ALU Operation
E :Execute instruction
• The I segment fetches the instruction from program memory. The
instruction is decoded and an ALU operation is performed in the A
segment.
• The ALU is used for three different functions, depending on the decoded
instruction. It performs an operation for a data manipulation instruction, it
evaluates the effective address for a load or store instruction, or it
calculates the branch address for a program control instruction.
• The E segment directs the output of the ALU to one of three destinations,
depending on the decoded instruction.
it transfers the result of the ALU operation into a destination register in
the register file, it transfers the effective address to a data memory for
loading or storing, or it transfers the branch address to the program
counter.

PipeLine with DELAYED LOAD :


Consider following four instructions :
1. LOAD :R1<--M [address 1]
2. LOAD :R2<--M [address 2]
3. ADD : R3 <--R1 + R2
4.STORE : M [address 3] <--R3
• If the three-segment pipeline proceeds without interruptions, there will be
a data conflict in instruction 3 because the operand in R2 is not yet
available in the A segment.
• This can be seen from the timing of the pipeline shown in Fig. 9(a). The
E segment in clock cycle 4 is in a process of placing the memory data into
R2.
• The A segment in clock cycle 4 is using the data from R2, but the value
in R2 will not be the correct value since it has not yet been transferred
from memory.
• It is up to the compiler to make sure that the instruction following the
load instruction uses the data fetched from memory. If the compiler
cannot find a useful instruction to put after the load, it inserts a no-op (no-
operation) instruction.
• memory but has no operation, thus wasting a clock cycle. This concept
of delaying the use of the data loaded from memory is referred to as
delayed load.
• Figure 9(b) shows the same program with a no-op instruction inserted
after the load to R2 instruction. The data is loaded into R2 in clock cycle
4.
• The add instruction uses the value of R2 in step 5.
• Thus the no-op instruction is used to advance one clock cycle in order
to compensate for the data conflict in the pipeline. (Note that no operation
is performed in segment A during clock cycle 4 or segment E during
clock cycle 5.)
• The advantage of the delayed load approach is that the data dependency
is taken care of by the compiler rather than the hardware.
• This results in a simpler hardware segment since the segment does not
have to check if the content of the register being accessed is currently
valid or not.

DELAYED BRANCH

• Load from memory to R1


• Increment R2
• Add R3 to R4
• Subtract R5 from R6
• Branch to address x
• if the load instruction is at address 101 and X is equal to 350 , the branch
instruction is fetched from address 103
• the add instruction is fetched from address 104 and executed in clocl
cycle 6.
• the subtract instruction is fetched from address 105 and executed in clock
cycle 7.
• since the value of X is transferred to PC with clock cycle 5 in the E
segment ,the instruction fetched from memory at clock cycle 6 is from
address 350, which is the instruction at the branch address .
Data Hazards and its Handling Methods
Data Hazards occur when an instruction depends on the result of previous
instruction and that result of instruction has not yet been computed. whenever
two different instructions use the same storage. the location must appear as if it
is executed in sequential order.
There are four types of data dependencies: Read after Write (RAW), Write after
Read (WAR), Write after Write (WAW), and Read after Read (RAR). These are
explained as follows below.
• Read after Write (RAW) :
It is also known as True dependency or Flow dependency. It occurs when
the value produced by an instruction is required by a subsequent
instruction. For example,
ADD R1, --, --;
SUB --, R1, --;

Stalls are required to handle these hazards..


• Write after Read (WAR) :
It is also known as anti dependency. These hazards occur when the output
register of an instruction is used right after read by a previous instruction.
For example,
ADD --, R1, --;
SUB R1, --, --;

• Write after Write (WAW) :


It is also known as output dependency. These hazards occur when the
output register of an instruction is used for write after written by previous
instruction. For example,
ADD R1, --, --;
SUB R1, --, --;

• Read after Read (RAR) :


It occurs when the instruction both read from the same register. For
example,
ADD --, R1, --;
SUB --, R1, --;
• Since reading a register value does not change the register value, these
Read after Read (RAR) hazards don’t cause a problem for the processor.

Handling Data Hazards :


These are various methods we use to handle hazards: Forwarding, Code
reordering , and Stall insertion.
These are explained as follows below.
1. Forwarding :
It adds special circuitry to the pipeline. This method works because it
takes less time for the required values to travel through a wire than it does
for a pipeline segment to compute its result.
2. Code reordering :
We need a special type of software to reorder code. We call this type of
software a hardware-dependent compiler.
3. Stall Insertion :
it inserts one or more stall (no-op instructions) into the pipeline, which
delays the execution of the current instruction until the required operand
is written to the register file, but this method decreases pipeline efficiency
and throughput.

You might also like