0% found this document useful (0 votes)
5 views

Module-2: Memory Systems Basic Processing Unit

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module-2: Memory Systems Basic Processing Unit

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 183

Module-2

Memory Systems
Basic Processing Unit
Syllabus
⚫ Memory System: Chapter 5 – 5.1 to 5.4, 5.5 (5.5.1, 5.5.2), 5.6
⚫ Basic Concepts
⚫ Semiconductor RAM Memories,
⚫ Read Only Memories
⚫ Speed, Size, and Cost
⚫ Cache Memories
⚫ Mapping Functions
⚫ Replacement Algorithms
⚫ Performance Considerations.
⚫ Basic Processing Unit: Chapter7, Chapter 8 – 8.1
⚫ Some Fundamental Concepts
⚫ Execution of a Complete Instruction
⚫ Multiple Bus Organization
⚫ Hard-wired Control
⚫ Micro programmed Control
⚫ Basic concepts of pipelining,
⚫ Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Computer
Organization, 5th Edition, Tata McGraw Hill, 2002
To find Least Recently Used block in cache two
methods can be implemented
• Stack Based
• Counter Based

Block-
1
Block-
2
Block-
3
Cache Block-
4
Main Memory
Block-
5
Block-
6
Stack Based LRU
• Replacement is required.
• Block 1 is used before 1 timeslot
• Block 2 is used before 2 timeslots
Main • Block 4 is used before 3 timeslots
memory • Block 3 is used before 4 timeslots
blocks • Hence Block is least recently used block among-Blocks 1,2, 4 and 3 and
which hence it is replaced
are to
moved to 1 2 3 4 2 1 5 6 2

cache M M M M H H M M H
4 2 1 5 6 2
3 3 4 2 1 5 6
2 2 2 3 4 2 1 5
1 1 1 1 1 3 4 2 1

Cache
• H-Cache Hit:
• When Hit occurs hit block is moved to top of stack indicating it is recent
• M-Cache Miss:
• When this occurs LRU block is replaced (From the bottom of the cache
Counter Based LRU
• H-Cache Hit:
• When Hit occurs, Counter of hit block is set to 1
• Counters below to this are incremented and above to it remains s
• M-Cache Miss:
• When this occurs the counter of corresponding block is changed
Main memory • Remaining non-zero counters are incremented
blocks which 1 2 3 4 2 1 5 6 2
are to moved M M M M H H M M H
to cache
Counter for Block-1 1 2 3 4 4 1 2 3 4
Counter for Block-2 0 1 2 3 1 2 3 4 1
Counter for Block-3 0 0 1 2 3 4 0 0 0
Counter for Block-4 0 0 0 1 2 3 4 0 0
Counter for Block-5 0 0 0 0 0 0 1 2 3
Counter for Block-6 0 0 0 0 0 0 0 1 2
Basic Processing Unit
Fundamental Concepts
⚫ Processor fetches one instruction at a time and
perform the operation specified.
⚫ Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
⚫ Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
⚫ Instruction Register (IR)
Executing an Instruction
⚫ Fetch the contents of the memory location pointed
to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR ← [[PC]]
⚫ Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
⚫ Carry out the actions specified by the instruction in
the IR (execution phase).
I nt ernal proc es s or
bus

C ont rol s ignals

PC

I ns t ruc t ion
Addres s
dec oder and
lines
MAR c ont rol logic

Processor Organization
Mem ory
bus

MD R
D at a
lines IR

C ons t ant 4 R0

Selec t MU X

Add
A B
ALU Sub R( n - 1)
c ont rol ALU
lines
C arry -in
XOR TEMP

Fi gure 7.1. Si ngl e-bus organi zati on of the datapath i nsi de a processor.

Datapath
Internal organization of the
processor
⚫ ALU
⚫ Registers for temporary storage
⚫ Various digital circuits for executing different micro
operations.(gates, MUX,decoders,counters).
⚫ Internal path for movement of data between ALU
and registers.
⚫ Driver circuits for transmitting signals to external
units.
⚫ Receiver circuits for incoming signals from external
units.
⚫ PC:
❖ Keeps track of execution of a program
❖ Contains the memory address of the next instruction to be
fetched and executed.
MAR:
❖ Holds the address of the location to be accessed.
❖ I/P of MAR is connected to Internal bus and an O/p to external
bus.
MDR:
❖ Contains data to be written into or read out of the addressed
location.
❖ IT has 2 inputs and 2 Outputs.
❖ Data can be loaded into MDR either from memory bus or from
internal processor bus.
The data and address lines are connected to the internal bus via
MDR and MAR
Registers:
❖ The processor registers R0 to Rn-1 vary considerably from one
processor to another.
❖ Registers are provided for general purpose used by
programmer.
❖ Special purpose registers-index & stack registers.

❖ Registers Y,Z &TEMP are temporary registers used by


processor during the execution of some instruction.
Multiplexer:
❖ Select either the output of the register Y or a constant value 4
to be provided as input A of the ALU.
❖ Constant 4 is used by the processor to increment the contents
of PC.
ALU:
Used to perform arithmetic and logical
operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.
Internal processor
bus

R i in

1.Register Transfers Ri

R i out

Y in

Constant 4

Select MUX

A B
ALU

Z in

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
⚫ The input and output gates for register Ri are
controlled by signals isRin and Riout .
⚫ Rin Is set to1 – data available on common bus
are loaded into Ri.
⚫ Riout Is set to1 – the contents of register are
placed on the bus.
⚫ Riout Is set to 0 – the bus can be used for
transferring data from other registers .
Data transfer between two
registers:
EX:
Transfer the contents of R1 to R4.
1. Enable output of register R1 by setting
R1out=1. This places the contents of R1 on
the processor bus.
2. Enable input of register R4 by setting
R4in=1. This loads the data from the
processor bus into register R4.
Architecture Riin
Internal processor
bus

Ri

Riout

Y in

Constant 4

Select MUX

A B
ALU

Z in

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
2.Performing an Arithmetic or
Logic Operation
⚫ The ALU is a combinational circuit that has no
internal storage.
⚫ ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
⚫ What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Step 1: Output of the register R1 and input of
the register Y are enabled, causing the
contents of R1 to be transferred to Y.
Step 2: The multiplexer’s select signal is set to
select Y causing the multiplexer to gate the
contents of register Y to input A of the ALU.
Step 3: The contents of Z are transferred to the
destination register R3.
Register Transfers 0

Ri in
Bus

Clock
D Q

Figure 7.3. Input and outputating


Riout

g for one gister


re bit.

⚫ All operations and data transfers are controlled by the processor clock.

Figure 7.3. Input and output gating for one register bit.
Fetching a Word from Memory
Memory -bus
data lines MDRoutE

MDR
MDRout
Internal processor
bus

MDR inE MDRin

Figure 7.4. Connection and control signals for


gister
re MDR.

⚫ Address into MAR; issue Read operation; data into MDR.

Figure 7.4. Connection and control signals for register MDR.


3.Fetching a Word from
Memory
⚫ The response time of each memory access varies
(cache miss, memory-mapped I/O,…).
⚫ To accommodate this, the processor waits until it
receives an indication that the requested operation
has been completed (Memory-Function-Completed,
MFC).
⚫ Move (R1), R2
➢ MAR ← [R1]
➢ Start a Read operation on the memory bus
➢ Wait for the MFC response from the memory
➢ Load MDR from the memory bus
➢ R2 ← [MDR]
Data

MFC

MDR out

Figure 7.5. Timing of a memory Read operation.

Timing
Assume MAR
is always available
on the address lines
of the memory bus.

⚫ Move (R1), R2
1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
4.Storing a word in memory
⚫ Address is loaded into MAR
⚫ Data to be written loaded into MDR.
⚫ Write command is issued.
⚫ Example:Move R2,(R1)
R1out,MARin
R2out,MDRin,Write
MDRoutE, WMFC
Execution of a Complete
Instruction
⚫ Add (R3), R1
⚫ Fetch the instruction
⚫ Fetch the first operand (the contents of the
memory location pointed to by R3)
⚫ Perform the addition
⚫ Load the result into R1
Execution of a Complete Memory
bus
Address
lines
PC

MAR
Internal processor
bus

Control signals

Instruction
decoder and
control logic

Instruction
MDR
Data
lines IR

Constant 4 R0

Select MUX

Add
A B
ALU Sub R( n - 1)
control ALU
Step Action lines
Carry -in
XOR TEMP
1 PCout , MAR in , Read, Select4,A dd, Zin
Z
2 Zout , PCin , Y in , WMF C
3 MDR out , IR in
4 R3out , MAR in , Read
Figure 7.1. Single-bus organization of the datapath inside a processor.
5 R1out , Y in , WMF C
6 MDR out , SelectY,Add, Zin
7 Zout , R1 in , End

Figure 7.6. Control sequencefor executionof the instruction Add (R3),R1.

Add (R3), R1
Execution of Branch
Instructions
⚫ A branch instruction replaces the contents of
PC with the branch target address, which is
usually obtained by adding an offset X given
in the branch instruction.
⚫ The offset X is usually the difference between
the branch target address and the address
immediately following the branch instruction.
⚫ UnConditional branch
Execution of Branch
Instructions

Step Action

1 PC out , MAR in , Read, Select4, Add, Z in


2 Z out , PC in , Y in , WMF C
3 MDR out , IR in
4 Offset-field-of-IR out, Add, Z in
5 Z out , PC in , End

Figure 7.7. Control sequence for an unconditional branch instruction.


Bus A Bus B Bus C

Incrementer

Multiple-Bus Organization
PC

Register
f ile

Constant 4

MUX
A

ALU R

Instruction
decoder

IR

MDR

MAR

Memory b us Address
data lines lines

Figure 7.8. T hree-b


us organization of the datapath.

• Allow the contents of two


different registers to be
accessed simultaneously and
have their contents placed on
buses A and B.
• Allow the data on bus C to
be loaded into a third register
during the same clock cycle.
• Incrementer unit.
• ALU simply passes one of
its two input operands
unmodified to bus C
→ control signal: R=A or R=B
⚫ General purpose registers are combined into
a single block called registers.
⚫ 3 ports,2 output ports –access two different
registers and have their contents on buses A
and B
⚫ Third port allows data on bus c during same
clock cycle.
⚫ Bus A & B are used to transfer the source
operands to A & B inputs of the ALU.
⚫ ALU operation is performed.
⚫ The result is transferred to the destination
over the bus C.
⚫ ALU may simply pass one of its 2 input operands
unmodified to bus C.
⚫ The ALU control signals for such an operation R=A
or R=B.
⚫ Incrementer unit is used to increment the PC by 4.
⚫ Using the incrementer eliminates the need to add
the constant value 4 to the PC using the main ALU.
⚫ The source for the constant 4 at the ALU input
multiplexer can be used to increment other address
such as loadmultiple & storemultiple
Multiple-Bus Organization
⚫ Add R4, R5, R6

Step Action

1 PC out, R=B, MAR in , Read, IncPC


2 WMF C
3 MDR outB , R=B, IR in

4 R4 outA , R5 outB , SelectA, Add, R6 in , End

Figure 7.9. Control sequence for the instruction. Add R4,R5,R6,


for the three-bus organization in Figure 7.8.
⚫ Step 1:The contents of PC are passed
through the ALU using R=B control signal &
loaded into MAR to start a memory read
operation
At the same time PC is incrementer by 4
⚫ Step 2:The processor waits for MFC
⚫ Step 3: Loads the data ,received into MDR
,then transfers them to IR.
⚫ Step 4: The execution phase of the
instruction requires only one control step to
complete.
Internal processor
bus

Control signals

PC

Instruction
Address
decoder and
lines
MAR control logic

Memory
bus

MDR
Data
lines IR

Exercise
Constant 4 R0

Select MUX

Add
A B
ALU Sub R( n - 1)
control ALU
lines
Carry -in
XOR TEMP

Figure 7.1. Single-bus organization of the datapath inside a processor.

⚫ What is the control


sequence for
execution of the
instruction
Add R1, R2
including the
instruction fetch
phase? (Assume
single bus
architecture)
Overview
⚫ To execute instructions, the processor must
have some means of generating the control
signals needed in the proper sequence.
⚫ Two categories: hardwired control and
microprogrammed control
⚫ Hardwired system can operate at high speed;
but with little flexibility.
Control Unit Organization
CLK Control step
Clock counter

External
inputs
Decoder/
IR
encoder
Condition
codes

Control signals

Figure 7.10. Control unit organization.


CLK
Clock Control step Reset
counter

Step decoder

T 1 T2 Tn

INS1
External
INS2 inputs
Instruction
IR Encoder
decoder
Condition
codes

Detailed Block Description


INSm

Run End

Control signals

Figure 7.11. Separation of the decoding and encoding functions.


Generating Zin
⚫ Zin = T1 + T6 • ADD + T4 • BR + …
Branch Add

T4 T6

T1

Figure 7.12. Generation of the Zin control signal for the processor in Figure 7.1.
Generating End
Branch<0
Add Branch
N N

T7 T5 T4 T5

End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) • BRN +…


End


Figure 7.13. Generation of the End control signal.
Instruction Integer Floating-point
unit unit unit

Instruction Data
cache cache

Bus interface
Processor

A Complete Processor
Sy stem us
b

Main Input/
memory Output

Figure 7.14. Block diagram of a complete processor


.
Microprogrammed Control
⚫ Control signals are generated by a program similar to machine
language programs.
MDRout

WMFC
MAR in

Select
Read
PCout

R1out

R3out

Micro -
End
PCin

R1in
Add

Z out
IRin
Yin

Zin

instruction

1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1

Figure 7.15 An example of microinstructions for Figure 7.6.

⚫ Control Word (CW); microroutine; microinstruction


Overview Step

1
2
3
4
5
6
7
Action

PCout , MAR in , Read, Select4,A dd, Zin


Zout , PCin , Y in , WMF C
MDR out , IR in
R3out , MAR in , Read
R1out , Y in , WMF C
MDR out , SelectY,Add, Zin
Zout , R1 in , End

Figure 7.6. Control sequencefor executionof the instruction Add (R3),R1.


Basic organization of a Starting

microprogrammed control unit


IR address
generator

Clock PC

Control
store CW

Figure 7.16. Basic organization of a microprogrammed control unit.

⚫ Control store

One function
cannot be carried
out by this simple
organization.
Conditional branch
⚫ The previous organization cannot handle the situation when the control
unit is required to check the status of the condition codes or external
inputs to choose between alternative courses of action.
⚫ Use conditional branch microinstruction.
Address Microinstruction

0 PC out , MAR in , Read, Select4, Add, Z in


1 Z out , PC in , Y in , WMF C
2 MDR out , IR in
3 Branch to starting address of appropriate microroutine
. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ..
25 If N=0, then branch to microinstruction 0
26 Offset-field-of-IR out , SelectY, Add, Z in
27 Z out , PC in , End

Figure 7.17. Microroutine for the instruction Branch<0.


Microprogrammed Control
External
inputs

Starting and
branch address Condition
IR codes
generator

Clock PC

Control
store CW

Figure 7.18. Organization of the control unit to allow


conditional branching in the microprogram.
Microinstructions
⚫ A straightforward way to structure
microinstructions is to assign one bit position
to each control signal.
⚫ However, this is very inefficient.
⚫ The length can be reduced: most signals are
not needed simultaneously, and many signals
are mutually exclusive.
⚫ All mutually exclusive signals are placed in
the same group in binary coding.
Partial Format for the
Microinstruction

F1

F1 (4 bits)

0100: R0
0101: R1
out
0010: MDRout
0011: Zout
out
out
F2

F2 (3 bits)

010: IRin
011: Zin
100: R0in
101: R1in
F3

F3 (3 bits)

010: MDRin
011: TEMPin
100: Yin
F4

F4 (4 bits)

0000: No transf er 000: No transf er000: No transf er 0000: Add


0001: PC 001: PCin 001: MARin 0001: Sub

1111: XOR
F5

F5 (2 bits)

00: No action
01: Read
10: Write

Microinstructions
0110: R2 110: R2in 16 ALU
out f unctions
0111: R3out 111: R3in
1010: TEMPout
1011: Of f set
out

F6 F7 F8

F6 (1 bit) F7 (1 bit) F8 (1 bit)

0: SelectY 0: No action 0: Continue


1: Select4 1: WMFC 1: End

Figure 7.19. An example of a partial format for field-encoded microinstructions.

What is the price paid for


this scheme?
Require a little more hardware
Further Improvement
⚫ Enumerate the patterns of required signals in
all possible microinstructions. Each
meaningful combination of active control
signals can then be assigned a distinct code.
⚫ Vertical organization
⚫ Horizontal organization
Microprogram Sequencing
⚫ If all microprograms require only straightforward
sequential execution of microinstructions except for
branches, letting a μPC governs the sequencing
would be efficient.
⚫ However, two disadvantages:
➢ Having a separate microroutine for each machine instruction results
in a large total number of microinstructions and a large control store.
➢ Longer execution time because it takes more time to carry out the
required branches.
⚫ Example: Add src, Rdst
⚫ Four addressing modes: register, autoincrement,
autodecrement, and indexed (with indirect forms).
- Bit-ORing
- Wide-Branch Addressing
- WMFC
Mode

Contents of IR OP code 0 1 0 Rsrc Rdst

11 10 8 7 4 3 0

Address Microinstruction
(octal)

000 PC out, MAR in, Read, Select 4 , Add, Z in


001 Z out, PC in, Y in, WMFC
002 MDR out, IR in
003 Branch {  PC  101 (from Instruction decoder);
PC 5,4  [IR 10,9]; PC 3  [IR 10]  [IR 9]  [IR 8]}
121 Rsrc out , MAR in , Read, Select4, Add, Z in

122 Z out, Rsrc in


123 Branch { PC  170; PC 0  [IR 8]}, WMFC
170 MDR out, MAR in, Read, WMFC
171 MDR out, Y in
172 Rdstout , SelectY, Add, Z in

173 Z out, Rdst in, End

Figure 7.21. Microinstruction for Add (Rsrc)+,Rdst.


Note: Microinstruction at location 170 is not executed for this addressing mode.
Microinstructions with Next-
Address Field
⚫ The microprogram we discussed requires several
branch microinstructions, which perform no useful
operation in the datapath.
⚫ A powerful alternative approach is to include an
address field as a part of every microinstruction to
indicate the location of the next microinstruction to
be fetched.
⚫ Pros: separate branch microinstructions are virtually
eliminated; few limitations in assigning addresses to
microinstructions.
⚫ Cons: additional bits for the address field (around
1/6)
Microinstructions with Next- External
Inputs
IR

Condition
codes

Decoding circuits

Address Field
A R

Control store

Next address I R

Microinstruction decoder

Control signals

Figure 7.22. Microinstruction-sequencing organization.


0: NextAdrs 0: No action 0: No action
1: InstDec 1: ORmode 1: ORindsrc

Figure 7.23. Format for microinstructions in the example of Section 7.5.3.


Implementation of the
Octal
address F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10

Microroutine
000 0 0 0 0 0 0 0 1 0 0 1 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
001 0 0 0 0 0 0 1 0 0 1 1 00 1 1 0 0 0 0 0 0 00 0 1 0 0 0
002 0 0 0 0 0 0 1 1 0 1 0 01 0 0 0 0 0 0 0 0 00 0 0 0 0 0
003 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 1 1 0

121 0 1 0 1 0 0 1 0 1 0 0 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
122 0 1 1 1 1 0 0 0 0 1 1 10 0 0 0 0 0 0 0 0 00 0 1 0 0 1

170 0 1 1 1 1 0 0 1 0 1 0 00 0 0 0 1 0 0 0 0 01 0 1 0 0 0
171 0 1 1 1 1 0 1 0 0 1 0 00 0 1 0 0 0 0 0 0 00 0 0 0 0 0
172 0 1 1 1 1 0 1 1 1 0 1 01 1 0 0 0 0 0 0 0 00 0 0 0 0 0
173 0 0 0 0 0 0 0 0 0 1 1 10 1 0 0 0 0 0 0 0 00 0 0 0 0 0

Figure 7.24. Implementation of the microroutine of Figure 7.21 using a


next-microinstruction address field.
(See Figure 7.23 for encoded signals.)
A R

Control store

Next address F1 F2 F8 F9 F10

Rdstout

Rdstin
Microinstruction
decoder
Rsrcout

Rsrcin

Other control signals

Figure 7.25. Some details of the control-signal-generating circuitry.


bit-ORing
Prefetching Microinstruction
⚫ Fetching from control store takes more time in
microprogrammed control which can be reduced
by prefetching next instruction while the present
instruction is still executing
⚫ But this has organizational difficulties like there
may be need of status flag and result of the
present instruction is needed to generate the
address of next address and hence leads to
incorrect address
⚫ So complex circuitry needed to generate correct
address with repeated fetching
Emulation
⚫ Allows to replace obsolete machine to updated
machine
⚫ Let M1 and M2 are instructions set of system 1 and 2
respectively
⚫ If M1 is replaced by M2 and Machine level program
written in M2 can be run on M1 then it can be said as
M1 emulates M2
⚫ Emulation
⚫ Facilitates transition of computer with lesser disruption
⚫ Is easy if both the system are having similar architecture
Pipelining
Role of cache memory
4- stage pipeline
Pipeline Stall caused by cache
miss in F2
Pipeline Stall caused by cache
miss in F2

You might also like