CO Unit 4_Processing_Pipelining (1)
CO Unit 4_Processing_Pipelining (1)
Co n t ro l s i g n al s
PC
In s t ru ct i o n
Ad d res s
d eco d er an d
l i n es
MAR co n t ro l l o g i c
Memo ry
bus
MDR
Dat a
l i n es IR
Co n s t an t 4 R0
Datapath
Ad d
A B
ALU Su b R n - 1
co n t ro l ALU
l i n es
Carry -i n
XOR TEMP
programmer.
Special purpose registers-index & stack registers.
of PC.
ALU:
Used to perform arithmetic and logical
operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.
Internal processor
b us
Ri in
1.Register Transfers Ri
Ri out
Y in
Constant 4
Select MUX
A B
ALU
Z in
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
The input and output gates for register Ri are
controlled by signals isRin and Riout .
R Is set to1 – data available on common bus
in
are loaded into Ri.
R
iout Is set to1 – the contents of register are
placed on the bus.
R
iout Is set to 0 – the bus can be used for
transferring data from other registers .
Data transfer between two
registers:
EX:
Transfer the contents of R1 to R4.
1. Enable output of register R1 by setting
R1out=1. This places the contents of R1 on
the processor bus.
2. Enable input of register R4 by setting
R4in=1. This loads the data from the
processor bus into register R4.
Architecture Riin
Internal processor
bus
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
2.Performing an Arithmetic or
Logic Operation
The ALU is a combinational circuit that has no
internal storage.
ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Step 1: Output of the register R1 and input of
the register Y are enabled, causing the
contents of R1 to be transferred to Y.
Step 2: The multiplexer’s select signal is set to
select Y causing the multiplexer to gate the
contents of register Y to input A of the ALU.
Step 3: The contents of Z are transferred to the
destination register R3.
Register Transfers
All operations and data transfers are controlled by the processor clock.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure 7.3.
Figure 7.3.Input
Inputand
andoutput
output gating
gating for one register
register bit.
bit.
Fetching a Word from Memory
Address into MAR; issue Read operation; data into MDR.
Memory-bus Internal processor
data lines MDRoutE MDRout bus
MDR
Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
3.Fetching a Word from
Memory
The response time of each memory access varies
(cache miss, memory-mapped I/O,…).
To accommodate this, the processor waits until it
receives an indication that the requested operation
has been completed (Memory-Function-Completed,
MFC).
Move (R1), R2
MAR ← [R1]
Start a Read operation on the memory bus
Wait for the MFC response from the memory
Load MDR from the memory bus
R2 ← [MDR]
Step 1 2 3
Timing Clock
MARin
MDRinE
Move (R1), R2
1. R1out, MARin, Read Data
2. MDRinE, WMFC
MFC
3. MDRout, R2in
MDR out
Example:Move R2,(R1)
R1out,MARin
R2out,MDRin,Write
MDRoutE, WMFC
Execution of a Complete
Instruction
Add (R3), R1
Fetch the instruction
PC
Step Action
Instruction
Address
decoder and
lines
1 PC out , MAR in , Read, Select4,Add, Zin MAR control logic
Memory
2 Zout , PC in , Y in , WMF C bus
3 MDRout , IR in Data
MDR
lines IR
4 R3out , MAR in , Read
5 R1out , Y in , WMF C Y
Constant 4 R0
6 MDRout , SelectY,Add, Zin
7 Zout , R1in , End Select MUX
Add
A B
ALU Sub R n - 1
control ALU
lines
Figure7.6. Control sequence
forexecutionof theinstructionAdd (R3),R1.
XOR
Carry-in
TEMP
StepAction
Incrementer
PC
• Allow the contents of two
different registers to be accessed
Register
file
simultaneously and have their
contents placed on buses A and B.
Constant 4
A
loaded into a third register during
ALU R
B
the same clock cycle.
Instruction
• Incrementer unit.
decoder
StepAction
as hazard
1.Data Hazard- data is not available
2. Control Hazard / Instruction Hazard: Delay in
availability of instruction (miss in cache)
Alternative Representation
Pipelined Performance
Stalls / bubbles
Pipelined Performance
3. Structured Hazard: when two instructions require
hardware at the same time (access of memory)
Eg: If data and instruction are in the same cache memory
Avoided by providing sufficient hardware
Data Hazard
The pipeline is stalled because data are delayed
Data Hazard
Eg1 : A=5
I1 = A 3 + A
I2 = B 4 * A
Eg2:
I1 = A 5 * C
I2 = B 20 + C
I1 = MUL R2,R3,R4
I2 = ADD R5,R4,R6
operand Forwarding
operand Forwarding
Handling Data Hazards
Alternate approach:
Is to leave data dependent on software
By introducing two-cycle delays as NoP(No operation) by
compiler
May lead to large program size
Side Effects
Instruction may change the content of the register
Autoincrement and Autodecrement
If the destination operand is affected –side effect
Eg :
1, Push and Pop
2. Add with carry-multiple data dependence